Predicting Liver Disease Using Machine Learning: A Data-Driven Approach¶

Problem Statement¶

Liver diseases are a significant global health concern, affecting millions of people and leading to severe complications if not diagnosed early. Traditional diagnostic methods rely heavily on laboratory tests and clinical expertise, which may be time-consuming and require specialized resources. There is a need for an efficient, data-driven approach to predict liver disease accurately using patient records. This study aims to leverage machine learning models to develop a predictive system that can assist healthcare professionals in diagnosing liver disease based on clinical and demographic data.

Context¶

Liver disease is a broad term that includes conditions such as fatty liver, hepatitis, cirrhosis, and liver cancer. These diseases are influenced by various factors, including alcohol consumption, viral infections, metabolic disorders, and genetic predisposition. Early detection and timely intervention can significantly improve patient outcomes.

This dataset, comprising 30,691 patient records with 11 clinical features, provides an opportunity to develop an automated predictive model. The dataset includes laboratory test results, demographic information, and other biomarkers crucial for detecting liver abnormalities. By applying machine learning techniques, we can enhance diagnostic accuracy, reduce misclassification rates, and support medical professionals in clinical decision-making.

Objective¶

  • Develop a binary classification model to predict whether a patient has liver disease (1) or not (0).
  • Explore various machine learning algorithms to determine the most effective model for liver disease prediction.
  • Analyze feature importance to identify the key clinical markers contributing to liver disease detection.
  • Improve predictive accuracy using feature engineering, hyperparameter tuning, and ensemble learning techniques.
  • Provide a framework for deploying a real-world decision support system that can assist healthcare professionals in diagnosing liver disease efficiently.

Dataset¶

Liver Disease Patient Dataset 30K train data

  • archive.zip
    • Age Age of the patient
    • Gender Gender of the patient
    • TB Total Bilirubin
      • Bilirubin is a yellow pigment formed during red blood cell breakdown.
      • High levels indicate potential liver dysfunction or bile duct obstruction.
    • DB Direct Bilirubin
      • A fraction of total bilirubin that is water-soluble.
      • Elevated direct bilirubin suggests obstructive jaundice or hepatitis.
    • Alkphos Alkaline Phosphotase
      • An enzyme found in the liver, bones, and bile ducts.
      • High ALP levels may indicate cholestasis (bile blockage), liver disease, or bone disorders.
    • Sgpt Alamine Aminotransferase
      • An enzyme found in liver cells.
      • Elevated ALT levels suggest liver cell damage (hepatitis, fatty liver, or alcohol-related liver disease).
    • Sgot Aspartate Aminotransferase
      • Another enzyme in the liver and muscles.
      • High SGOT levels indicate liver or muscle damage.
    • TP Total Protiens
      • Sum of albumin & globulin proteins in the blood.
      • Low protein levels may indicate malnutrition, liver, or kidney disease.
    • ALB Albumin
      • A protein made by the liver that helps maintain blood volume and transport nutrients.
      • Low albumin is a marker of chronic liver disease, malnutrition, or kidney disorders.
    • A/G Ratio Albumin and Globulin Ratio
      • Measures the balance between albumin & globulin proteins.
      • Low A/G ratio can indicate chronic liver disease, autoimmune disorders, or inflammation.
    • Result Selector field used to split the data into two sets (labeled by the experts)
      • 1 Liver Patient → Patients diagnosed with liver disease.
      • 2 Non-Liver Patient → Patients without liver disease.
In [1]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

Import Libraries¶

Run, restart kernel, comment out this cell, then run all

In [2]:
# # Uninstall conflicting packages completely
# !pip uninstall -y numpy scikit-learn imbalanced-learn scikeras tensorflow torch matplotlib seaborn pandas scipy dask-ml

# # Upgrade pip, setuptools, and wheel
# !pip install --upgrade pip setuptools wheel

# # Purge pip cache to remove broken package installs
# !pip cache purge

# # Install compatible versions of required packages
# !pip install --no-cache-dir numpy==1.26.4 scikit-learn==1.4.2 imbalanced-learn==0.13.0 \
#                           scikeras==0.13.0 tensorflow==2.18.0 torch==2.6.0 torchvision \
#                           torchaudio matplotlib==3.7.1 seaborn pandas scipy fastai dask-ml

# # Install any additional dependencies for imbalanced-learn (SMOTE)
# !pip install --no-cache-dir imbalanced-learn

# # 🚨 STOP HERE 🚨
# print("\n⚠️  Restart the notebook kernel NOW before running anything else.")

Run next cell

In [3]:
# ---------------------------------
# After restarting, run the following:
# ---------------------------------

# Import essential libraries and verify versions
import numpy as np
import sklearn
import tensorflow as tf
import torch
import imblearn
import scikeras
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import scipy
import dask_ml
from tensorflow.keras import backend

# Fixing the seed for random number generators to ensure reproducibility
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)

# Suppress warnings for a clean output
import warnings
warnings.filterwarnings("ignore")

# ✅ Verify Installed Versions
print("\n✅ Installed Versions:")
print("NumPy:", np.__version__)
print("SciKeras:", scikeras.__version__)
print("Scikit-Learn:", sklearn.__version__)
print("TensorFlow:", tf.__version__)
print("PyTorch:", torch.__version__)
print("Imbalanced-Learn:", imblearn.__version__)
print("Matplotlib:", matplotlib.__version__)
print("Seaborn:", sns.__version__)
print("Pandas:", pd.__version__)
print("Scipy:", scipy.__version__)
print("Dask-ML:", dask_ml.__version__)
✅ Installed Versions:
NumPy: 1.26.4
SciKeras: 0.13.0
Scikit-Learn: 1.4.2
TensorFlow: 2.18.0
PyTorch: 2.6.0+cu124
Imbalanced-Learn: 0.13.0
Matplotlib: 3.7.1
Seaborn: 0.13.2
Pandas: 2.2.3
Scipy: 1.15.2
Dask-ML: 2024.4.4
In [4]:
# pip check
In [5]:
# !pip install scikeras
In [6]:
# import pandas as pd
# import numpy as np

# from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import LabelEncoder, OneHotEncoder
# from sklearn import model_selection
# from sklearn.compose import ColumnTransformer
# import matplotlib.pyplot as plt
# import seaborn as sns
# from sklearn.impute import SimpleImputer
# import warnings
# from sklearn.metrics import confusion_matrix
# from sklearn.pipeline import Pipeline
# from sklearn.model_selection import GridSearchCV
# from sklearn.model_selection import RandomizedSearchCV
# import tensorflow as tf # deep learning
# from tensorflow.keras.models import Sequential
# from tensorflow.keras.layers import Dense
# from tensorflow.keras.layers import Dense, Input, Dropout,BatchNormalization
# from scikeras.wrappers import KerasClassifier

# import random
# from tensorflow.keras import backend
# random.seed(1)
# np.random.seed(1)
# tf.random.set_seed(1)
# warnings.filterwarnings("ignore")

Loading the Data¶

Unzip

In [7]:
import zipfile

# Define the path to your zip file
zip_path = "/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/archive.zip"

# Define your chosen extraction directory
extract_to = "/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/"  # Replace with your desired directory

# Extract the files to the specified directory
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_to)

print(f"Files extracted to: {extract_to}")
Files extracted to: /content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/

Load

In [8]:
import pandas as pd

# Correctly load the CSV file with proper encoding
df_train = pd.read_csv(
    "/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/Liver Patient Dataset (LPD)_train.csv",
    encoding="ISO-8859-1"
)

# Correctly load the Excel file
df_test = pd.read_excel(
    "/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/test.csv.xlsx"
)
In [9]:
df_train.shape, df_test.shape
Out[9]:
((30691, 11), (2109, 10))
In [10]:
df_train.columns.to_list()
Out[10]:
['Age of the patient',
 'Gender of the patient',
 'Total Bilirubin',
 'Direct Bilirubin',
 '\xa0Alkphos Alkaline Phosphotase',
 '\xa0Sgpt Alamine Aminotransferase',
 'Sgot Aspartate Aminotransferase',
 'Total Protiens',
 '\xa0ALB Albumin',
 'A/G Ratio Albumin and Globulin Ratio',
 'Result']
In [11]:
df_test.columns.to_list()
Out[11]:
[66, 'Female', 0.9, 0.2, 210, 35, 32, 8, 3.9, '0.9.1']
In [12]:
df_train.head()
Out[12]:
Age of the patient Gender of the patient Total Bilirubin Direct Bilirubin Alkphos Alkaline Phosphotase Sgpt Alamine Aminotransferase Sgot Aspartate Aminotransferase Total Protiens ALB Albumin A/G Ratio Albumin and Globulin Ratio Result
0 65.0 Female 0.7 0.1 187.0 16.0 18.0 6.8 3.3 0.90 1
1 62.0 Male 10.9 5.5 699.0 64.0 100.0 7.5 3.2 0.74 1
2 62.0 Male 7.3 4.1 490.0 60.0 68.0 7.0 3.3 0.89 1
3 58.0 Male 1.0 0.4 182.0 14.0 20.0 6.8 3.4 1.00 1
4 72.0 Male 3.9 2.0 195.0 27.0 59.0 7.3 2.4 0.40 1
In [13]:
df_test.head()
Out[13]:
66 Female 0.9 0.2 210 35 32 8 3.9 0.9.1
0 50 Male 9.4 5.2 268 21 63 6.4 2.8 0.8
1 42 Female 3.5 1.6 298 68 200 7.1 3.4 0.9
2 65 Male 1.7 0.8 315 12 38 6.3 2.1 0.5
3 22 Male 3.3 1.5 214 54 152 5.1 1.8 0.5
4 31 Female 1.1 0.3 138 14 21 7.0 3.8 1.1
  • Unknown column names in test data, will not use

Data Overview¶

In [14]:
data = df_train.copy()
In [15]:
data.shape
Out[15]:
(30691, 11)
In [16]:
data.head(5)
Out[16]:
Age of the patient Gender of the patient Total Bilirubin Direct Bilirubin Alkphos Alkaline Phosphotase Sgpt Alamine Aminotransferase Sgot Aspartate Aminotransferase Total Protiens ALB Albumin A/G Ratio Albumin and Globulin Ratio Result
0 65.0 Female 0.7 0.1 187.0 16.0 18.0 6.8 3.3 0.90 1
1 62.0 Male 10.9 5.5 699.0 64.0 100.0 7.5 3.2 0.74 1
2 62.0 Male 7.3 4.1 490.0 60.0 68.0 7.0 3.3 0.89 1
3 58.0 Male 1.0 0.4 182.0 14.0 20.0 6.8 3.4 1.00 1
4 72.0 Male 3.9 2.0 195.0 27.0 59.0 7.3 2.4 0.40 1
In [17]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30691 entries, 0 to 30690
Data columns (total 11 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   Age of the patient                    30689 non-null  float64
 1   Gender of the patient                 29789 non-null  object 
 2   Total Bilirubin                       30043 non-null  float64
 3   Direct Bilirubin                      30130 non-null  float64
 4    Alkphos Alkaline Phosphotase         29895 non-null  float64
 5    Sgpt Alamine Aminotransferase        30153 non-null  float64
 6   Sgot Aspartate Aminotransferase       30229 non-null  float64
 7   Total Protiens                        30228 non-null  float64
 8    ALB Albumin                          30197 non-null  float64
 9   A/G Ratio Albumin and Globulin Ratio  30132 non-null  float64
 10  Result                                30691 non-null  int64  
dtypes: float64(9), int64(1), object(1)
memory usage: 2.6+ MB
  • 30,691 Observatons
  • 11 columns
In [18]:
data.dtypes.value_counts()
Out[18]:
count
float64 9
object 1
int64 1

Check for duplicated data

In [19]:
data.duplicated().sum()
Out[19]:
11323

Check for null data

In [20]:
data.isnull().sum()
Out[20]:
0
Age of the patient 2
Gender of the patient 902
Total Bilirubin 648
Direct Bilirubin 561
Alkphos Alkaline Phosphotase 796
Sgpt Alamine Aminotransferase 538
Sgot Aspartate Aminotransferase 462
Total Protiens 463
ALB Albumin 494
A/G Ratio Albumin and Globulin Ratio 559
Result 0

In [21]:
data.isnull().sum().sum()
Out[21]:
5425
In [22]:
# Let's check for missing values in the data
round(data.isnull().sum() / data.isnull().count() * 100, 2) #  calculates the percentage of missing values in each column of the DataFrame
Out[22]:
0
Age of the patient 0.01
Gender of the patient 2.94
Total Bilirubin 2.11
Direct Bilirubin 1.83
Alkphos Alkaline Phosphotase 2.59
Sgpt Alamine Aminotransferase 1.75
Sgot Aspartate Aminotransferase 1.51
Total Protiens 1.51
ALB Albumin 1.61
A/G Ratio Albumin and Globulin Ratio 1.82
Result 0.00

Get the proportion of unique values in the "Target" column

In [23]:
# get the proportion of unique values in the "Target" column
data["Result"].value_counts(0), data["Result"].value_counts(1)
Out[23]:
(Result
 1    21917
 2     8774
 Name: count, dtype: int64,
 Result
 1    0.714118
 2    0.285882
 Name: proportion, dtype: float64)
  • 1 Liver Patient

  • 2 Non Liver Patient

  • Not balanced, will be hard to predict

In [24]:
data.describe().T
Out[24]:
count mean std min 25% 50% 75% max
Age of the patient 30689.0 44.107205 15.981043 4.0 32.0 45.0 55.0 90.0
Total Bilirubin 30043.0 3.370319 6.255522 0.4 0.8 1.0 2.7 75.0
Direct Bilirubin 30130.0 1.528042 2.869592 0.1 0.2 0.3 1.3 19.7
Alkphos Alkaline Phosphotase 29895.0 289.075364 238.537589 63.0 175.0 209.0 298.0 2110.0
Sgpt Alamine Aminotransferase 30153.0 81.488641 182.158850 10.0 23.0 35.0 62.0 2000.0
Sgot Aspartate Aminotransferase 30229.0 111.469979 280.851078 10.0 26.0 42.0 88.0 4929.0
Total Protiens 30228.0 6.480237 1.081980 2.7 5.8 6.6 7.2 9.6
ALB Albumin 30197.0 3.130142 0.792281 0.9 2.6 3.1 3.8 5.5
A/G Ratio Albumin and Globulin Ratio 30132.0 0.943467 0.323164 0.3 0.7 0.9 1.1 2.8
Result 30691.0 1.285882 0.451841 1.0 1.0 1.0 2.0 2.0
  • The dataset contains 30,689 records for "Age of the Patient."
  • Other medical attributes such as bilirubin levels, enzyme levels, and protein ratios have slightly fewer records, indicating some missing values.
  • The Result column likely represents a binary classification outcome (1 or 2).

Outliers & Variability

  • Features like Sgpt, Sgot, and Bilirubin levels show extreme max values. Consider handling outliers through log transformations or winsorization.

Missing Data

  • Some features have missing values (e.g., Total Bilirubin, ALB Albumin).
  • Use imputation techniques (mean/median imputation or predictive modeling) to fill gaps.

Feature Importance

  • "A/G Ratio" and "Bilirubin" are crucial indicators of liver disease.
  • Consider correlation analysis and feature selection before modeling.

Class Imbalance Check

  • The Result column (1 or 2) should be analyzed for class distribution.
  • If imbalanced, consider SMOTE (Synthetic Minority Over-sampling Technique) or class weighting in models.

Unique Values

In [25]:
data.nunique()
Out[25]:
0
Age of the patient 77
Gender of the patient 2
Total Bilirubin 113
Direct Bilirubin 80
Alkphos Alkaline Phosphotase 263
Sgpt Alamine Aminotransferase 152
Sgot Aspartate Aminotransferase 177
Total Protiens 58
ALB Albumin 40
A/G Ratio Albumin and Globulin Ratio 69
Result 2

In [26]:
data['Gender of the patient'].value_counts()
Out[26]:
count
Gender of the patient
Male 21986
Female 7803

EDA¶

Univariate Analysis¶

In [27]:
# Function to plot a boxplot and a histogram along the same scale.

def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
    """
    Boxplot and histogram combined

    data: dataframe
    feature: dataframe column
    figsize: size of figure (default (12,7))
    kde: whether to the show density curve (default False)
    bins: number of bins for histogram (default None)
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  # Number of rows of the subplot grid= 2
        sharex=True,  # x-axis will be shared among all subplots
        gridspec_kw={"height_ratios": (0.25, 0.75)},
        figsize=figsize,
    )  # creating the 2 subplots
    sns.boxplot(
        data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
    )  # boxplot will be created and a star will indicate the mean value of the column
    sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
    ) if bins else sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2
    )  # For histogram
    ax_hist2.axvline(
        data[feature].mean(), color="green", linestyle="--"
    )  # Add mean to the histogram
    ax_hist2.axvline(
        data[feature].median(), color="black", linestyle="-"
    )  # Add median to the histogram
In [28]:
histogram_boxplot(data, "Age of the patient")
No description has been provided for this image
  • Majority of patients are between 30-60 years old, with the highest concentration near 40-45 years.
  • Few elderly patients (80+), but they appear as mild outliers.
  • The second peak at 55+ suggests an additional age cluster, possibly due to different medical conditions or risk factors.
In [29]:
histogram_boxplot(data, "Total Bilirubin")
No description has been provided for this image
  • Most patients have normal bilirubin levels (below 2.7), but there are extreme cases.
  • Strong right-skewness suggests abnormal values that could be from severe medical conditions.
In [30]:
histogram_boxplot(data, "Direct Bilirubin")
No description has been provided for this image
  • Most patients have normal direct bilirubin levels (below 2.0), but extreme cases exist.
  • A strong right-skewed distribution with high outliers suggests severe liver disease.
In [31]:
histogram_boxplot(data, "\xa0Alkphos Alkaline Phosphotase")
No description has been provided for this image
  • Most patients have ALP levels between 100-400 U/L, but there are extreme cases.
  • Strong right-skewed distribution suggests abnormally high ALP values in some patients.
In [32]:
histogram_boxplot(data, "\xa0Sgpt Alamine Aminotransferase")
No description has been provided for this image
  • Most patients have SGPT levels below 120 U/L, but some extreme cases exceed 2,000 U/L.
  • Strong right-skewed distribution suggests a large number of high outliers.
In [33]:
histogram_boxplot(data, "Sgot Aspartate Aminotransferase")
No description has been provided for this image
  • Most patients have SGOT levels below 100 U/L, but some extreme cases exceed 5,000 U/L.
  • Strong right-skewed distribution with many high outliers.
In [34]:
histogram_boxplot(data, "Total Protiens")
No description has been provided for this image
  • Most patients have total protein levels between 6.0 - 7.5 g/dL, aligning with normal protein range.
  • Slight right skewness, but the distribution is mostly normal.
In [35]:
histogram_boxplot(data, "\xa0ALB Albumin")
No description has been provided for this image
  • Most patients have albumin levels between 2.5 - 3.8 g/dL, aligning with normal protein range.
  • Slight right skewness, but the distribution is mostly normal.
In [36]:
histogram_boxplot(data, "A/G Ratio Albumin and Globulin Ratio")
No description has been provided for this image
  • Most patients have an A/G ratio between 0.7 - 1.1, which is within the normal range.
  • Slight right skewness, but the distribution is mostly normal.
In [37]:
histogram_boxplot(data, "Result")
No description has been provided for this image
  • "Result" is a binary target variable (1 vs. 2).
  • Class 1 dominates, meaning the dataset is imbalanced.
In [38]:
# Function to create labeled barplots


def labeled_barplot(data, feature, perc=False, n=None):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    """

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 1, 5))
    else:
        plt.figure(figsize=(n + 1, 5))

    plt.xticks(rotation=90, fontsize=15)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="Paired",
        order=data[feature].value_counts().index[:n].sort_values(),
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot
In [39]:
labeled_barplot(data, "Gender of the patient")
No description has been provided for this image
In [40]:
# Ensure the "Result" column exists before plotting
if "Result" in data.columns:
    # Count values in the "Result" column
    result_counts = data["Result"].value_counts()

    # Labels for the pie chart
    labels = ["Liver Patient", "Non-Liver Patient"]

    # Sizes for the pie chart (count of each category)
    sizes = [result_counts[1], result_counts[2]]

    # Explode effect for better visualization
    explode = (0, 0.1)

    # Create the pie chart
    fig, ax = plt.subplots(figsize=(10, 8))
    ax.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
           shadow=True, startangle=90, colors=['#ff9999','#66b3ff'])

    ax.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle
    plt.title("Proportion of Liver and Non-Liver Patients", size=20)
    plt.show()
else:
    print("The column 'Result' is not found in the dataset.")
No description has been provided for this image

Bivariate Analysis¶

In [41]:
### Function to plot distributions

def distribution_plot_wrt_target(data, predictor, target):

    fig, axs = plt.subplots(2, 2, figsize=(12, 10))

    target_uniq = data[target].unique()

    axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
    sns.histplot(
        data=data[data[target] == target_uniq[0]],
        x=predictor,
        kde=True,
        ax=axs[0, 0],
        color="teal",
    )

    axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
    sns.histplot(
        data=data[data[target] == target_uniq[1]],
        x=predictor,
        kde=True,
        ax=axs[0, 1],
        color="orange",
    )

    axs[1, 0].set_title("Boxplot w.r.t target")
    sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")

    axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
    sns.boxplot(
        data=data,
        x=target,
        y=predictor,
        ax=axs[1, 1],
        showfliers=False,
        palette="gist_rainbow",
    )

    plt.tight_layout()
    plt.show()
In [42]:
distribution_plot_wrt_target(data, "Gender of the patient", "Result")
No description has been provided for this image
  • More male patients have liver disease (Target = 1) compared to females.
  • Non-liver patients (Target = 2) have a more balanced gender distribution, but males still dominate.
  • Boxplots confirm gender differences, showing that males are more affected by liver disease.
  • The distribution suggests a potential gender-based risk factor for liver disease.
In [43]:
distribution_plot_wrt_target(data, "Age of the patient", "Result")
No description has been provided for this image
  • Age distribution is similar for both liver and non-liver patients, peaking around 40-50 years.
  • Liver disease patients (Target = 1) show a slightly wider spread, with more cases at younger and older ages.
  • Boxplots confirm that median age is similar, but liver disease cases have slightly more extreme outliers.
  • No significant age-based distinction, suggesting age alone may not be a strong predictor of liver disease.
In [44]:
distribution_plot_wrt_target(data, "Total Bilirubin", "Result")
No description has been provided for this image
  • Liver patients (Target = 1) have significantly higher Total Bilirubin levels than non-liver patients.
  • Strong right-skewed distribution in both groups, with extreme outliers in liver disease cases (values exceeding 70).
  • Boxplots confirm a higher median bilirubin level in liver disease patients, with a much wider spread.
  • Non-liver patients mostly have bilirubin levels below 2, while liver patients show a much broader range.
In [45]:
distribution_plot_wrt_target(data, "Direct Bilirubin", "Result")
No description has been provided for this image
  • Liver patients (Target = 1) have significantly higher Direct Bilirubin levels than non-liver patients.
  • Strong right-skewed distribution in both groups, with extreme outliers in liver disease cases (values exceeding 15).
  • Boxplots confirm a higher median Direct Bilirubin level in liver disease patients, with a much wider spread.
  • Non-liver patients mostly have Direct Bilirubin levels below 0.5, while liver patients show a much broader range.
In [46]:
distribution_plot_wrt_target(data, "\xa0Alkphos Alkaline Phosphotase", "Result")
No description has been provided for this image
  • Liver patients (Target = 1) have generally higher ALP levels compared to non-liver patients.
  • Strong right-skewed distribution, with extreme outliers above 2000 U/L in liver disease cases.
  • Boxplots show a higher median ALP level in liver patients, with a wider interquartile range (IQR).
  • Non-liver patients mostly have ALP levels below 250, while liver patients exhibit a much broader spread.
In [47]:
distribution_plot_wrt_target(data, "\xa0Sgpt Alamine Aminotransferase", "Result")
No description has been provided for this image
  • Liver patients (Target = 1) have significantly higher SGPT (ALT) levels than non-liver patients.
  • Strong right-skewed distribution, with extreme outliers exceeding 1500 U/L in liver disease cases.
  • Boxplots confirm a higher median SGPT level in liver disease patients, with a much wider interquartile range (IQR).
  • Non-liver patients mostly have SGPT levels below 50, while liver patients exhibit a broader range with high variability.
In [48]:
distribution_plot_wrt_target(data, "Sgot Aspartate Aminotransferase", "Result")
No description has been provided for this image
  • Liver patients (Target = 1) have significantly higher SGOT (AST) levels compared to non-liver patients.
  • Strong right-skewed distribution, with extreme outliers exceeding 4000 U/L in liver disease cases.
  • Boxplots confirm that median SGOT levels are much higher in liver patients, with a wider interquartile range (IQR).
  • Non-liver patients mostly have SGOT levels below 50, while liver patients show a broader range with high variability.
In [49]:
distribution_plot_wrt_target(data, "Total Protiens", "Result")
No description has been provided for this image
  • Total Protein distribution is similar for both liver and non-liver patients, showing a near-normal distribution.
  • Slightly lower total protein levels in liver disease patients (Target = 1) compared to non-liver patients (Target = 2).
  • Boxplots confirm a small difference in median values, but with overlapping interquartile ranges (IQR).
  • Outliers exist in both groups, but no extreme differences, indicating Total Proteins alone is not a strong differentiator for liver disease.
In [50]:
distribution_plot_wrt_target(data, "\xa0ALB Albumin", "Result")
No description has been provided for this image
  • Non-liver patients (Target = 2) generally have higher Albumin levels compared to liver patients (Target = 1).
  • Liver disease patients show a slightly left-skewed distribution, indicating lower albumin levels on average.
  • Boxplots confirm a lower median Albumin level in liver patients, with less variability compared to non-liver patients.
  • Albumin levels could be a useful indicator of liver function, but some overlap exists between the two groups.
In [51]:
distribution_plot_wrt_target(data, "A/G Ratio Albumin and Globulin Ratio", "Result")
No description has been provided for this image
  • Non-liver patients (Target = 2) have higher A/G ratios on average than liver patients (Target = 1).
  • Liver disease patients exhibit a left-skewed distribution, indicating a tendency for lower A/G ratios.
  • Boxplots confirm that the median A/G ratio is lower in liver disease cases, with more outliers at the lower end.
  • A/G ratio may be a useful predictor of liver disease, but some overlap exists between the groups.
In [52]:
# function to plot stacked bar chart


def stacked_barplot(data, predictor, target):
    """
    Print the category counts and plot a stacked bar chart

    data: dataframe
    predictor: independent variable
    target: target variable
    """
    count = data[predictor].nunique()
    sorter = data[target].value_counts().index[-1]
    tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
        by=sorter, ascending=False
    )
    print(tab1)
    print("-" * 120)
    tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
        by=sorter, ascending=False
    )
    tab.plot(kind="bar", stacked=True, figsize=(count + 1, 5))
    plt.legend(
        loc="lower left",
        frameon=False,
    )
    plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
    plt.show()
In [53]:
stacked_barplot(data, "Age of the patient", "Result")
Result                  1     2    All
Age of the patient                    
All                 21915  8774  30689
45.0                 1033   434   1467
42.0                  887   398   1285
60.0                  899   378   1277
50.0                  936   372   1308
...                   ...   ...    ...
80.0                    5     4      9
84.0                   12     3     15
77.0                    8     2     10
89.0                    0     2      2
83.0                    1     0      1

[78 rows x 3 columns]
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
In [54]:
stacked_barplot(data, "Gender of the patient", "Result")
Result                     1     2    All
Gender of the patient                    
All                    21295  8494  29789
Male                   15742  6244  21986
Female                  5553  2250   7803
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
In [55]:
stacked_barplot(data, "Total Bilirubin", "Result")
Result               1     2    All
Total Bilirubin                    
All              21483  8560  30043
0.7               2014  1915   3929
0.8               2886  1702   4588
0.9               1926   947   2873
0.6               1492   892   2384
...                ...   ...    ...
6.2                 52     0     52
5.9                 48     0     48
5.7                 51     0     51
5.5                 60     0     60
7.4                 50     0     50

[114 rows x 3 columns]
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
In [56]:
stacked_barplot(data, "Direct Bilirubin", "Result")
Result                1     2    All
Direct Bilirubin                    
All               21536  8594  30130
0.2                5623  4185   9808
0.1                2037  1222   3259
0.3                1642  1047   2689
0.6                 406   396    802
...                 ...   ...    ...
5.2                  59     0     59
5.5                  53     0     53
5.6                  51     0     51
6.0                  55     0     55
4.6                  48     0     48

[81 rows x 3 columns]
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
In [57]:
stacked_barplot(data, "\xa0Alkphos Alkaline Phosphotase", "Result")
Result                             1     2    All
 Alkphos Alkaline Phosphotase                    
All                            21354  8541  29895
145.0                            149   305    454
180.0                            201   293    494
165.0                            153   267    420
158.0                            255   242    497
...                              ...   ...    ...
272.0                            254     0    254
276.0                             51     0     51
280.0                             98     0     98
282.0                            401     0    401
263.0                            107     0    107

[264 rows x 3 columns]
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
In [58]:
stacked_barplot(data, "\xa0Sgpt Alamine Aminotransferase", "Result")
Result                              1     2    All
 Sgpt Alamine Aminotransferase                    
All                             21560  8593  30153
18.0                              411   451    862
22.0                              544   433    977
28.0                              451   413    864
32.0                              198   390    588
...                               ...   ...    ...
97.0                               52     0     52
96.0                              105     0    105
95.0                              110     0    110
94.0                               50     0     50
93.0                               53     0     53

[153 rows x 3 columns]
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
In [59]:
stacked_barplot(data, "Sgot Aspartate Aminotransferase", "Result")
Result                               1     2    All
Sgot Aspartate Aminotransferase                    
All                              21590  8639  30229
23.0                               303   519    822
21.0                               189   510    699
29.0                               240   315    555
28.0                               370   310    680
...                                ...   ...    ...
126.0                               51     0     51
125.0                              109     0    109
116.0                               51     0     51
114.0                               49     0     49
150.0                               53     0     53

[178 rows x 3 columns]
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
In [60]:
stacked_barplot(data, "Total Protiens", "Result")
Result              1     2    All
Total Protiens                    
All             21586  8642  30228
7.0              1190   481   1671
6.0              1067   468   1535
6.1               504   407    911
8.0               663   365   1028
7.3               582   364    946
6.8              1170   330   1500
5.9               401   328    729
5.2               305   313    618
7.1               853   303   1156
6.9              1020   300   1320
6.7               481   293    774
5.5               621   267    888
7.9               465   264    729
7.2               844   262   1106
6.2               958   258   1216
7.4               351   253    604
6.4               710   250    960
6.5               532   249    781
8.2               189   205    394
6.3               516   205    721
5.6               711   202    913
4.9               167   166    333
5.8               572   164    736
7.8               312   164    476
5.1               379   163    542
6.6               666   159    825
5.3               359   145    504
7.6               323   145    468
4.5                97   117    214
3.9                 0   108    108
7.5               684    99    783
5.7               442    99    541
8.5               143    97    240
5.4               575    95    670
8.4                60    91    151
9.2                59    58    117
4.6               146    54    200
3.7                 0    52     52
4.8               108    52    160
5.0               511    52    563
7.7                98    49    147
3.8                50    49     99
8.1               262    48    310
8.3                96    48    144
4.3               175     1    176
4.0               101     0    101
4.1                99     0     99
3.6               169     0    169
3.0                49     0     49
4.4               214     0    214
4.7                99     0     99
8.6               151     0    151
8.7                48     0     48
8.9                49     0     49
2.8                46     0     46
9.5                47     0     47
9.6                48     0     48
2.7                49     0     49
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
In [61]:
stacked_barplot(data, "\xa0ALB Albumin", "Result")
Result            1     2    All
 ALB Albumin                    
All           21561  8636  30197
3.0            1777   581   2358
2.9             963   509   1472
3.5             699   475   1174
3.2             884   474   1358
4.0            1453   465   1918
3.9             851   433   1284
4.1             413   414    827
4.2             207   398    605
3.1            1119   362   1481
3.8             400   353    753
3.6             591   352    943
3.7             701   350   1051
2.3             325   311    636
3.3             772   311   1083
4.4             110   295    405
2.6             876   264   1140
2.5             979   253   1232
2.2             424   214    638
4.3             485   203    688
2.7            1120   162   1282
2.8             761   154    915
4.5             146   145    291
1.9             250   114    364
3.4             973   111   1084
1.6             311   108    419
2.4             810   108    918
2.0             991   108   1099
4.6              96   107    203
1.4              50   105    155
2.1             635   100    735
1.8             552    55    607
1.7             113    52    165
5.0               0    49     49
4.7              90    48    138
4.8              55    47    102
4.9             157    46    203
1.0              59     0     59
1.5             162     0    162
5.5              96     0     96
0.9             105     0    105
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
In [62]:
stacked_barplot(data, "A/G Ratio Albumin and Globulin Ratio", "Result")
Result                                    1     2    All
A/G Ratio Albumin and Globulin Ratio                    
All                                   21547  8585  30132
1.0                                    3823  1600   5423
0.9                                    1905  1153   3058
1.2                                     831   974   1805
1.1                                    1641   721   2362
...                                     ...   ...    ...
1.09                                     59     0     59
1.11                                     58     0     58
1.12                                     49     0     49
0.53                                     48     0     48
0.3                                     226     0    226

[70 rows x 3 columns]
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
In [63]:
# Select only numerical columns
numeric_data = data.select_dtypes(include=[np.number])

# Compute correlation matrix
correlation_matrix = numeric_data.corr()

# Create heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Feature Correlation Heatmap")
plt.show()
No description has been provided for this image

EDA Analysis¶

  • The dataset is imbalanced, with significantly more liver patients than non-liver patients.
  • Some features have extreme values (outliers), especially Total Bilirubin Direct Bilirubin, SGPT, SGOT, and Alkaline Phosphatase.
  • Features such as Total Proteins, Albumin, and A/G Ratio have more normally distributed values.
  • Males are more represented than females.
  • There is a higher proportion of liver disease in males compared to females.
  • No significant difference in age distribution between liver and non-liver patients.
  • SGPT, SGOT, and Bilirubin levels are strong indicators of liver disease
  • Liver patients tend to have lower albumin levels than non-liver patients.

Data Preprocessing¶

Remove columns with no relationship to Target column 'Result'

  • Age of the patient (correlation ~0)
  • A/G Ratio Albumin and Globulin Ratio (correlation ~0.16, relatively weak)

Potential Fixes

  • Total Bilirubin and Direct Bilirubin - Highly correlated (~0.89). Drop one of them to avoid multicollinearity
  • Sgpt Alanine Aminotransferase and Sgot Aspartate Aminotransferase - Strong correlation (~0.78). Keep one.
  • Total Proteins and ALB Albumin - Correlated (~0.78). If needed, keep only one.
  • Skewness: Variables like bilirubin and aminotransferases show high skewness. Consider log transformation.
  • Categorical Encoding: If "Gender of the patient" is still in the dataset, encode it properly.
In [64]:
data.drop(['Age of the patient', 'Gender of the patient', 'A/G Ratio Albumin and Globulin Ratio', 'Direct Bilirubin', '\xa0Sgpt Alamine Aminotransferase', 'Total Protiens'], axis = 1, inplace = True)
In [65]:
data.head()
Out[65]:
Total Bilirubin Alkphos Alkaline Phosphotase Sgot Aspartate Aminotransferase ALB Albumin Result
0 0.7 187.0 18.0 3.3 1
1 10.9 699.0 100.0 3.2 1
2 7.3 490.0 68.0 3.3 1
3 1.0 182.0 20.0 3.4 1
4 3.9 195.0 59.0 2.4 1

Null Values

In [66]:
data.isnull().sum()
Out[66]:
0
Total Bilirubin 648
Alkphos Alkaline Phosphotase 796
Sgot Aspartate Aminotransferase 462
ALB Albumin 494
Result 0

In [67]:
data.columns.to_list()
Out[67]:
['Total Bilirubin',
 '\xa0Alkphos Alkaline Phosphotase',
 'Sgot Aspartate Aminotransferase',
 '\xa0ALB Albumin',
 'Result']
In [68]:
# Get the list of columns with missing values
columns_with_missing = [
    'Total Bilirubin',
    '\xa0Alkphos Alkaline Phosphotase',
    'Sgot Aspartate Aminotransferase',
    '\xa0ALB Albumin'
]

# Create a copy of the DataFrame
df_null = data.copy()

# Iterate over each column with missing values
for col in columns_with_missing:
    # Select the column with missing values
    col_data = df_null[col].copy()

    # Create columns with different NaN handling methods
    col_bfill = col_data.fillna(method='bfill')  # Backward fill
    col_ffill = col_data.fillna(method='ffill')  # Forward fill
    col_interpolation = col_data.interpolate(method='linear')  # Linear interpolation
    col_mean = col_data.fillna(col_data.mean())  # Fill with mean
    col_mode = col_data.fillna(col_data.mode()[0])  # Fill with mode

    # Create a list of methods and their corresponding labels
    methods = [
        ("Original", col_data),
        ("Backward Fill", col_bfill),
        ("Forward Fill", col_ffill),
        ("Interpolation", col_interpolation),
        ("Mean Fill", col_mean),
        ("Mode Fill", col_mode)
    ]

    # Create subplots: 2 rows, 3 columns with a larger figure size
    fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(22, 12))  # Increased figsize for better visibility
    fig.suptitle(f"Comparison of Missing Data Handling Methods for {col}", fontsize=20)

    # Flatten axes for easier indexing
    axes = axes.flatten()

    # Plot each method in its own subplot
    for i, (label, method) in enumerate(methods):
        # Plot the method data
        axes[i].plot(method, linewidth=2, label=label, color='gray' if label != "Original" else 'blue')

        # Highlight missing data positions
        if label == "Original":
            # Red scatter points for missing data in the original column
            axes[i].scatter(method.index[method.isnull()], [0] * method.isnull().sum(),
                            color='red', label='Missing Data (Original)', zorder=3)
        else:
            # Red scatter points where filling occurred
            missing_indices = col_data.isnull() & ~method.isnull()  # Locations where fill happened
            axes[i].scatter(missing_indices.index[missing_indices], method[missing_indices],
                            color='red', label='Filled Data', zorder=3)

        # Add titles and labels
        axes[i].set_title(label, fontsize=16)
        axes[i].set_xlabel('Index', fontsize=14)
        axes[i].set_ylabel('Value', fontsize=14)
        axes[i].grid(True)
        axes[i].legend(fontsize=12)

    # Adjust layout to prevent overlap
    plt.tight_layout(rect=[0, 0, 1, 0.95])  # Leave space for suptitle
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
  • Interpolation is ideal for fluctuating variables (Alkphos & Sgot) to maintain data consistency.
  • Mean/Mode Fill is best for stable variables (Bilirubin & ALB) to preserve overall distribution.
In [69]:
data["Total Bilirubin"].fillna(data["Total Bilirubin"].mean(), inplace=True)
data["\xa0ALB Albumin"].fillna(data["\xa0ALB Albumin"].mode()[0], inplace=True)

data["\xa0Alkphos Alkaline Phosphotase"].interpolate(method="linear", inplace=True)
data["Sgot Aspartate Aminotransferase"].interpolate(method="linear", inplace=True)
In [70]:
data.isnull().sum()
Out[70]:
0
Total Bilirubin 0
Alkphos Alkaline Phosphotase 0
Sgot Aspartate Aminotransferase 0
ALB Albumin 0
Result 0

In [71]:
data.duplicated().sum()
Out[71]:
28908
In [72]:
data.head()
Out[72]:
Total Bilirubin Alkphos Alkaline Phosphotase Sgot Aspartate Aminotransferase ALB Albumin Result
0 0.7 187.0 18.0 3.3 1
1 10.9 699.0 100.0 3.2 1
2 7.3 490.0 68.0 3.3 1
3 1.0 182.0 20.0 3.4 1
4 3.9 195.0 59.0 2.4 1

Separate Independand and Dependent Columns

In [73]:
## Separating Independent and Dependent Columns
X = data.drop(['Result'],axis=1)
Y = data[['Result']]
In [74]:
# Import the required function
from sklearn.model_selection import train_test_split

# Splitting the dataset into Training and Testing sets
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)
In [75]:
# Convert labels to binary format (1,2 → 0,1) for model compatibility
y_train = (y_train == 2).astype(int)
y_test = (y_test == 2).astype(int)
In [76]:
x_train.head()
Out[76]:
Total Bilirubin Alkphos Alkaline Phosphotase Sgot Aspartate Aminotransferase ALB Albumin
18983 0.7 162.0 41.0 2.5
8417 1.7 859.0 48.0 3.0
14114 6.8 542.0 66.0 3.1
15253 2.2 209.0 20.0 4.0
14647 2.6 236.0 90.0 2.6
In [77]:
y_train.head()
Out[77]:
Result
18983 1
8417 0
14114 0
15253 0
14647 0
In [78]:
###Checking the shape of train and test sets
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)
(24552, 4)
(6139, 4)
(24552, 1)
(6139, 1)
In [79]:
sns.countplot(data=data, x='Result', edgecolor = 'black');
No description has been provided for this image
In [80]:
Y["Result"].value_counts(1)
Out[80]:
proportion
Result
1 0.714118
2 0.285882

Model Building¶

Balance Weights

  • Class weights were computed earlier to address the imbalance in the dataset, where Class 1 (Non-Liver Patient) was 71.4% and Class 2 (Liver Patient) was 28.6%. Without balancing, the model might favor the majority class, leading to poor performance in detecting liver patients. Assigning higher weight to the minority class ensures the model learns patterns from both classes fairly, improving recall for liver patients. This step is applied in model.fit() to make training more effective and prevent bias toward the dominant class.
In [81]:
from sklearn.utils.class_weight import compute_class_weight

classes = np.unique(Y)  # Keep classes as 1 and 2
class_weights = compute_class_weight(class_weight="balanced", classes=classes, y=Y.values.ravel())
class_weight_dict = {classes[i]: class_weights[i] for i in range(len(classes))}

print(class_weight_dict)  # Should correctly map weights for {1: weight1, 2: weight2}
{1: 0.7001642560569421, 2: 1.7489742420788694}
In [82]:
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)

Model 1 - "relu"¶

In [83]:
# Import necessary modules
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Initializing the ANN
model1 = Sequential()

# The amount of nodes (dimensions) in hidden layer should be the average of input and output layers, in this case 64.
# This adds the input layer (by specifying input dimension) AND the first hidden layer (units)
model1.add(Dense(64, activation='relu', input_dim=4))  # Change input_dim to 4

# Add 1st hidden layer
model1.add(Dense(32, activation='relu'))  # Hidden layer

# Adding the output layer
# Notice that we do not need to specify input dim.
# We have an output of 1 node, which is the desired dimensions of our output (stay with the bank or not)
# We use the sigmoid because we want probability outcomes
model1.add(Dense(1, activation='sigmoid'))  # Output layer
In [84]:
# Create optimizer with default learning rate
# Compile the model
model1.compile(optimizer='SGD', loss='binary_crossentropy', metrics=['accuracy'])
In [85]:
model1.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense (Dense)                        │ (None, 64)                  │             320 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 32)                  │           2,080 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_2 (Dense)                      │ (None, 1)                   │              33 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 2,433 (9.50 KB)
 Trainable params: 2,433 (9.50 KB)
 Non-trainable params: 0 (0.00 B)

Train the Model

In [86]:
history = model1.fit(
    x_train, y_train,
    validation_split=0.2,
    epochs=50,
    batch_size=32,
    verbose=1
)
Epoch 1/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7067 - loss: 2.0910 - val_accuracy: 0.7052 - val_loss: 0.5329
Epoch 2/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7163 - loss: 0.5324 - val_accuracy: 0.7052 - val_loss: 0.5312
Epoch 3/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7163 - loss: 0.5308 - val_accuracy: 0.7052 - val_loss: 0.5279
Epoch 4/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7163 - loss: 0.5308 - val_accuracy: 0.7035 - val_loss: 0.5288
Epoch 5/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7159 - loss: 0.5283 - val_accuracy: 0.7029 - val_loss: 0.5266
Epoch 6/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7157 - loss: 0.5276 - val_accuracy: 0.7041 - val_loss: 0.5253
Epoch 7/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7154 - loss: 0.5274 - val_accuracy: 0.7062 - val_loss: 0.5249
Epoch 8/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7148 - loss: 0.5252 - val_accuracy: 0.7009 - val_loss: 0.5230
Epoch 9/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7148 - loss: 0.5248 - val_accuracy: 0.6988 - val_loss: 0.5231
Epoch 10/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7149 - loss: 0.5244 - val_accuracy: 0.7015 - val_loss: 0.5226
Epoch 11/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7151 - loss: 0.5234 - val_accuracy: 0.6988 - val_loss: 0.5216
Epoch 12/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7151 - loss: 0.5230 - val_accuracy: 0.6995 - val_loss: 0.5216
Epoch 13/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7162 - loss: 0.5221 - val_accuracy: 0.7031 - val_loss: 0.5211
Epoch 14/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7161 - loss: 0.5221 - val_accuracy: 0.7076 - val_loss: 0.5202
Epoch 15/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7178 - loss: 0.5212 - val_accuracy: 0.7206 - val_loss: 0.5211
Epoch 16/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7170 - loss: 0.5212 - val_accuracy: 0.7029 - val_loss: 0.5246
Epoch 17/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7156 - loss: 0.5211 - val_accuracy: 0.7019 - val_loss: 0.5236
Epoch 18/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7138 - loss: 0.5223 - val_accuracy: 0.6972 - val_loss: 0.5225
Epoch 19/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7135 - loss: 0.5220 - val_accuracy: 0.7052 - val_loss: 0.5238
Epoch 20/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7151 - loss: 0.5205 - val_accuracy: 0.7013 - val_loss: 0.5231
Epoch 21/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7154 - loss: 0.5194 - val_accuracy: 0.7135 - val_loss: 0.5188
Epoch 22/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 2ms/step - accuracy: 0.7154 - loss: 0.5196 - val_accuracy: 0.6992 - val_loss: 0.5248
Epoch 23/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7110 - loss: 0.5224 - val_accuracy: 0.7127 - val_loss: 0.5182
Epoch 24/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7139 - loss: 0.5188 - val_accuracy: 0.7194 - val_loss: 0.5202
Epoch 25/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7131 - loss: 0.5185 - val_accuracy: 0.7094 - val_loss: 0.5216
Epoch 26/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7124 - loss: 0.5178 - val_accuracy: 0.7231 - val_loss: 0.5173
Epoch 27/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7139 - loss: 0.5174 - val_accuracy: 0.7188 - val_loss: 0.5177
Epoch 28/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7119 - loss: 0.5166 - val_accuracy: 0.7147 - val_loss: 0.5167
Epoch 29/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7130 - loss: 0.5168 - val_accuracy: 0.7259 - val_loss: 0.5184
Epoch 30/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7137 - loss: 0.5163 - val_accuracy: 0.7223 - val_loss: 0.5184
Epoch 31/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7126 - loss: 0.5174 - val_accuracy: 0.7113 - val_loss: 0.5189
Epoch 32/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7138 - loss: 0.5165 - val_accuracy: 0.7196 - val_loss: 0.5165
Epoch 33/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7148 - loss: 0.5156 - val_accuracy: 0.7188 - val_loss: 0.5187
Epoch 34/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7121 - loss: 0.5156 - val_accuracy: 0.7147 - val_loss: 0.5150
Epoch 35/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7139 - loss: 0.5156 - val_accuracy: 0.7109 - val_loss: 0.5174
Epoch 36/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7127 - loss: 0.5152 - val_accuracy: 0.7180 - val_loss: 0.5146
Epoch 37/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7137 - loss: 0.5152 - val_accuracy: 0.7127 - val_loss: 0.5199
Epoch 38/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7123 - loss: 0.5147 - val_accuracy: 0.7157 - val_loss: 0.5138
Epoch 39/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7116 - loss: 0.5160 - val_accuracy: 0.7296 - val_loss: 0.5121
Epoch 40/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7138 - loss: 0.5139 - val_accuracy: 0.7214 - val_loss: 0.5168
Epoch 41/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7148 - loss: 0.5144 - val_accuracy: 0.7245 - val_loss: 0.5095
Epoch 42/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7123 - loss: 0.5142 - val_accuracy: 0.7216 - val_loss: 0.5158
Epoch 43/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7122 - loss: 0.5124 - val_accuracy: 0.7235 - val_loss: 0.5105
Epoch 44/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7128 - loss: 0.5129 - val_accuracy: 0.7192 - val_loss: 0.5100
Epoch 45/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7132 - loss: 0.5126 - val_accuracy: 0.7135 - val_loss: 0.5174
Epoch 46/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7126 - loss: 0.5129 - val_accuracy: 0.7249 - val_loss: 0.5097
Epoch 47/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7131 - loss: 0.5131 - val_accuracy: 0.7290 - val_loss: 0.5122
Epoch 48/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7129 - loss: 0.5124 - val_accuracy: 0.7074 - val_loss: 0.5208
Epoch 49/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7112 - loss: 0.5127 - val_accuracy: 0.7188 - val_loss: 0.5138
Epoch 50/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7145 - loss: 0.5110 - val_accuracy: 0.7188 - val_loss: 0.5161
In [87]:
# Capturing learning history per epoch
hist  = pd.DataFrame(history.history)
hist['epoch'] = history.epoch

# Plotting accuracy at different epochs
plt.plot(hist['loss'])
plt.plot(hist['val_loss'])
plt.legend(("train" , "valid") , loc =0)

#Printing results
results = model1.evaluate(x_test, y_test)
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.7020 - loss: 0.5253
No description has been provided for this image
  • Initial spike: Model starts with high loss (~0.8) but quickly drops.
  • Stabilization: Loss flattens near 0.5, suggesting the model is learning but may be stuck.
  • No overfitting: Train and validation curves stay close, indicating generalization.
  • Potential issue: Loss around 0.5 suggests random guessing. Model may need feature tuning, better architecture, or different loss function.
In [88]:
y_pred=model1.predict(x_test)
y_pred = (y_pred > 0.5) # cut off point (threshold)
y_pred
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
Out[88]:
array([[False],
       [ True],
       [False],
       ...,
       [False],
       [False],
       [False]])
  • 1 = Non-Liver Patient

  • 2 = Liver Patient

  • False Positive (FP): Misclassifying a Non-Liver Patient as a Liver Patient

    • What happens? A healthy person (non-liver patient) is wrongly classified as having liver disease.
    • Risk:
    • They might undergo unnecessary medical tests, treatments, or even invasive procedures.
    • Increased emotional stress and financial burden due to unnecessary healthcare costs.
    • Misallocation of medical resources that could be used for actual liver patients.
  • False Negative (FN): Misclassifying a Liver Patient as a Non-Liver Patient

    • What happens? A person with liver disease is wrongly classified as healthy.
    • Risk:
      • The most critical risk—delayed diagnosis and treatment.
      • Disease could worsen, leading to complications like liver failure or cirrhosis.
      • Higher mortality risk if the condition is not treated in time.
  • False Negative

    • worse because they lead to untreated disease, which can become life-threatening
  • false positives

    • should also be minimized to avoid unnecessary medical interventions.

Create custom confusion matrix

In [89]:
def make_confusion_matrix(cf,
                          group_names=None,
                          categories='auto',
                          count=True,
                          percent=True,
                          cbar=True,
                          xyticks=True,
                          xyplotlabels=True,
                          sum_stats=True,
                          figsize=None,
                          cmap='Blues',
                          title=None):
    '''
    This function will make a pretty plot of an sklearn Confusion Matrix cm using a Seaborn heatmap visualization.
    Arguments
    '''


    # CODE TO GENERATE TEXT INSIDE EACH SQUARE
    blanks = ['' for i in range(cf.size)]

    if group_names and len(group_names)==cf.size:
        group_labels = ["{}\n".format(value) for value in group_names]
    else:
        group_labels = blanks

    if count:
        group_counts = ["{0:0.0f}\n".format(value) for value in cf.flatten()]
    else:
        group_counts = blanks

    if percent:
        group_percentages = ["{0:.2%}".format(value) for value in cf.flatten()/np.sum(cf)]
    else:
        group_percentages = blanks

    box_labels = [f"{v1}{v2}{v3}".strip() for v1, v2, v3 in zip(group_labels,group_counts,group_percentages)]
    box_labels = np.asarray(box_labels).reshape(cf.shape[0],cf.shape[1])


    # CODE TO GENERATE SUMMARY STATISTICS & TEXT FOR SUMMARY STATS
    if sum_stats:
        #Accuracy is sum of diagonal divided by total observations
        accuracy  = np.trace(cf) / float(np.sum(cf))



    # SET FIGURE PARAMETERS ACCORDING TO OTHER ARGUMENTS
    if figsize==None:
        #Get default figure size if not set
        figsize = plt.rcParams.get('figure.figsize')

    if xyticks==False:
        #Do not show categories if xyticks is False
        categories=False


    # MAKE THE HEATMAP VISUALIZATION
    plt.figure(figsize=figsize)
    sns.heatmap(cf,annot=box_labels,fmt="",cmap=cmap,cbar=cbar,xticklabels=categories,yticklabels=categories)


    if title:
        plt.title(title)
In [90]:
# Import necessary libraries
from sklearn.metrics import confusion_matrix, classification_report
from sklearn import metrics

# Calculate the confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Define labels and categories for visualization
labels = ['True Negative', 'False Positive', 'False Negative', 'True Positive']
categories = ['Non-Liver Patient', 'Liver Patient']  # Corrected categories

# Display the confusion matrix
make_confusion_matrix(cm,
                      group_names=labels,
                      categories=categories,
                      cmap='Blues')

# Print the classification report
print("Classification Report:\n")
print(metrics.classification_report(y_test, y_pred))
Classification Report:

              precision    recall  f1-score   support

           0       0.76      0.87      0.81      4384
           1       0.50      0.32      0.39      1755

    accuracy                           0.71      6139
   macro avg       0.63      0.59      0.60      6139
weighted avg       0.69      0.71      0.69      6139

No description has been provided for this image
  • Accuracy: 71% – The model is moderately accurate.
  • Precision & Recall:
    • Non-Liver Patient (0): Precision = 76%, Recall = 87% (good at identifying non-liver patients).
    • Liver Patient (1): Precision = 50%, Recall = 32% (poor recall, many liver patients misclassified).
  • Confusion Matrix:
    • True Negatives (TN): 3829 (62.37%) – Correctly classified non-liver patients.
    • False Positives (FP): 555 (9.04%) – Misclassified non-liver patients as liver patients.
    • False Negatives (FN): 1200 (19.55%) – Misclassified liver patients as non-liver.
    • True Positives (TP): 555 (9.04%) – Correctly classified liver patients.
  • Key Issue: High false negatives indicate the model struggles to detect liver patients. Consider better class balancing, adjusting thresholds, or improving feature selection.

Model 2 - "sequential"¶

  • Create more hidden layers
In [91]:
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
In [92]:
model2 = Sequential()
      #Adding the hidden and output layers
model2.add(Dense(256,activation='relu',kernel_initializer='he_uniform',input_dim = x_train.shape[1]))
model2.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
model2.add(Dense(64,activation='relu',kernel_initializer='he_uniform'))
model2.add(Dense(32,activation='relu',kernel_initializer='he_uniform'))
model2.add(Dense(1, activation = 'sigmoid'))
      #Compiling the ANN with Adam optimizer and binary cross entropy loss function
optimizer = tf.keras.optimizers.Adam(0.001)
model2.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
In [93]:
model2.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense (Dense)                        │ (None, 256)                 │           1,280 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 128)                 │          32,896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_2 (Dense)                      │ (None, 64)                  │           8,256 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_3 (Dense)                      │ (None, 32)                  │           2,080 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_4 (Dense)                      │ (None, 1)                   │              33 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 44,545 (174.00 KB)
 Trainable params: 44,545 (174.00 KB)
 Non-trainable params: 0 (0.00 B)
In [94]:
history2 = model2.fit(x_train,y_train,batch_size=64,epochs=50,verbose=1,validation_split = 0.2)
Epoch 1/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6363 - loss: 5.9120 - val_accuracy: 0.7031 - val_loss: 1.0647
Epoch 2/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6547 - loss: 1.2863 - val_accuracy: 0.6976 - val_loss: 0.7364
Epoch 3/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.6713 - loss: 0.8041 - val_accuracy: 0.7052 - val_loss: 1.8411
Epoch 4/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6615 - loss: 1.0397 - val_accuracy: 0.7031 - val_loss: 0.9344
Epoch 5/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6823 - loss: 0.6650 - val_accuracy: 0.7113 - val_loss: 0.5449
Epoch 6/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.6900 - loss: 0.6041 - val_accuracy: 0.7031 - val_loss: 0.9151
Epoch 7/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.6886 - loss: 0.6098 - val_accuracy: 0.6836 - val_loss: 0.5215
Epoch 8/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6864 - loss: 0.6318 - val_accuracy: 0.7052 - val_loss: 0.7875
Epoch 9/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.6976 - loss: 0.5614 - val_accuracy: 0.7049 - val_loss: 0.5829
Epoch 10/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.6881 - loss: 0.5866 - val_accuracy: 0.6577 - val_loss: 0.5664
Epoch 11/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6982 - loss: 0.5455 - val_accuracy: 0.6789 - val_loss: 0.5450
Epoch 12/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7011 - loss: 0.5340 - val_accuracy: 0.6777 - val_loss: 0.5449
Epoch 13/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7044 - loss: 0.5227 - val_accuracy: 0.6980 - val_loss: 0.5106
Epoch 14/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7075 - loss: 0.5189 - val_accuracy: 0.6703 - val_loss: 0.5467
Epoch 15/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7073 - loss: 0.5164 - val_accuracy: 0.6862 - val_loss: 0.5331
Epoch 16/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7066 - loss: 0.5367 - val_accuracy: 0.7060 - val_loss: 0.5181
Epoch 17/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7060 - loss: 0.5164 - val_accuracy: 0.6549 - val_loss: 0.5606
Epoch 18/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - accuracy: 0.7019 - loss: 0.5178 - val_accuracy: 0.6585 - val_loss: 0.5534
Epoch 19/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7027 - loss: 0.5168 - val_accuracy: 0.6632 - val_loss: 0.5590
Epoch 20/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7054 - loss: 0.5173 - val_accuracy: 0.6567 - val_loss: 0.5707
Epoch 21/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7071 - loss: 0.5180 - val_accuracy: 0.7013 - val_loss: 0.5296
Epoch 22/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7087 - loss: 0.5123 - val_accuracy: 0.6738 - val_loss: 0.5439
Epoch 23/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7037 - loss: 0.5222 - val_accuracy: 0.7096 - val_loss: 0.5152
Epoch 24/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7115 - loss: 0.5093 - val_accuracy: 0.7084 - val_loss: 0.5080
Epoch 25/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7111 - loss: 0.5086 - val_accuracy: 0.7288 - val_loss: 0.5124
Epoch 26/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7093 - loss: 0.5098 - val_accuracy: 0.7184 - val_loss: 0.5114
Epoch 27/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7181 - loss: 0.5066 - val_accuracy: 0.6874 - val_loss: 0.5382
Epoch 28/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7130 - loss: 0.5111 - val_accuracy: 0.7296 - val_loss: 0.5092
Epoch 29/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7102 - loss: 0.5129 - val_accuracy: 0.6597 - val_loss: 0.5672
Epoch 30/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7082 - loss: 0.5175 - val_accuracy: 0.6221 - val_loss: 0.5863
Epoch 31/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.7057 - loss: 0.5220 - val_accuracy: 0.6663 - val_loss: 0.5697
Epoch 32/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 12ms/step - accuracy: 0.7070 - loss: 0.5176 - val_accuracy: 0.7145 - val_loss: 0.5064
Epoch 33/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7186 - loss: 0.5036 - val_accuracy: 0.7157 - val_loss: 0.5028
Epoch 34/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7164 - loss: 0.5030 - val_accuracy: 0.6656 - val_loss: 0.5473
Epoch 35/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7106 - loss: 0.5116 - val_accuracy: 0.7263 - val_loss: 0.5037
Epoch 36/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7149 - loss: 0.5011 - val_accuracy: 0.7088 - val_loss: 0.5094
Epoch 37/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7162 - loss: 0.5016 - val_accuracy: 0.7182 - val_loss: 0.4998
Epoch 38/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7208 - loss: 0.5002 - val_accuracy: 0.7206 - val_loss: 0.4996
Epoch 39/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7177 - loss: 0.5035 - val_accuracy: 0.7027 - val_loss: 0.5044
Epoch 40/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7182 - loss: 0.5010 - val_accuracy: 0.7068 - val_loss: 0.5021
Epoch 41/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7184 - loss: 0.4996 - val_accuracy: 0.7060 - val_loss: 0.5055
Epoch 42/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7174 - loss: 0.4989 - val_accuracy: 0.7015 - val_loss: 0.5038
Epoch 43/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7179 - loss: 0.4990 - val_accuracy: 0.7031 - val_loss: 0.5055
Epoch 44/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7164 - loss: 0.4996 - val_accuracy: 0.7155 - val_loss: 0.5080
Epoch 45/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7199 - loss: 0.4998 - val_accuracy: 0.7049 - val_loss: 0.4954
Epoch 46/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7173 - loss: 0.4983 - val_accuracy: 0.7157 - val_loss: 0.5067
Epoch 47/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7193 - loss: 0.4978 - val_accuracy: 0.7019 - val_loss: 0.5026
Epoch 48/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7173 - loss: 0.4984 - val_accuracy: 0.7019 - val_loss: 0.5036
Epoch 49/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7176 - loss: 0.4963 - val_accuracy: 0.7119 - val_loss: 0.5001
Epoch 50/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7202 - loss: 0.4955 - val_accuracy: 0.7043 - val_loss: 0.5148
In [95]:
#Plotting Train Loss vs Validation Loss
plt.plot(history2.history['loss'])
plt.plot(history2.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
  • The loss decreases rapidly in the initial epochs and stabilizes around 0.5, indicating effective learning.
  • Spikes in validation loss at the start suggest fluctuations, but it aligns well with training loss later, implying no severe overfitting.
  • The model appears to generalize reasonably well.

ROC (Receiver Operating Characteristic) curve

  • evaluates the performance of your classification model by plotting the True Positive Rate (TPR) vs. False Positive Rate (FPR) at various threshold levels.
In [96]:
from sklearn.metrics import roc_curve

from matplotlib import pyplot

# predict probabilities
yhat1 = model2.predict(x_test)
# keep probabilities for the positive outcome only
yhat1 = yhat1[:, 0]
# calculate roc curves
fpr, tpr, thresholds1 = roc_curve(y_test, yhat1)
# calculate the g-mean for each threshold
gmeans1 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans1)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds1[ix], gmeans1[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
Best Threshold=0.420847, G-Mean=0.700
No description has been provided for this image
  • Curve Performance: Model is better than random guessing (diagonal line), but it's not optimal. The curve should be closer to the top-left corner for a better classifier.
  • Best Threshold: The black dot indicates the threshold that optimally balances sensitivity and specificity using the G-Mean.
  • Improvements Needed: If the curve is not close to 1, consider improving feature selection, balancing the dataset, or tuning the model's hyperparameters.

Tuning the threshold using ROC-AUC

In [97]:
#Predicting the results using best as a threshold
y_pred_e1=model2.predict(x_test)
y_pred_e1 = (y_pred_e1 > thresholds1[ix]) # threshold inputted
y_pred_e1
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Out[97]:
array([[False],
       [ True],
       [ True],
       ...,
       [ True],
       [False],
       [False]])
In [98]:
# Import necessary libraries
from sklearn.metrics import confusion_matrix, classification_report
from sklearn import metrics

# Calculate the confusion matrix
cm = confusion_matrix(y_test, y_pred_e1)

# Define labels and categories for visualization
labels = ['True Negative', 'False Positive', 'False Negative', 'True Positive']
categories = ['Non-Liver Patient', 'Liver Patient']  # Corrected categories

# Display the confusion matrix
make_confusion_matrix(cm,
                      group_names=labels,
                      categories=categories,
                      cmap='Blues')

# Print the classification report
print("Classification Report:\n")
print(metrics.classification_report(y_test, y_pred_e1))
Classification Report:

              precision    recall  f1-score   support

           0       0.89      0.60      0.72      4384
           1       0.45      0.82      0.58      1755

    accuracy                           0.66      6139
   macro avg       0.67      0.71      0.65      6139
weighted avg       0.77      0.66      0.68      6139

No description has been provided for this image
  • False Positives (28.62%): Healthy individuals misclassified as liver patients, leading to unnecessary anxiety and treatment.
  • False Negatives (5.21%): Liver patients wrongly classified as healthy, posing serious health risks due to missed diagnosis.
  • Recall for Liver Patients (82%): Good at identifying actual cases but low precision (45%) means many false alarms.

Model 3 - Batch Normalization techniqu¶

Normalize the data after each layer & less layers

In [99]:
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
In [100]:
# Import necessary modules
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization

# Initialize the ANN
model3 = Sequential()

# Add input layer with first hidden layer
model3.add(Dense(128, activation='relu', input_dim=x_train.shape[1]))

# Add Batch Normalization
model3.add(BatchNormalization())

# Add more hidden layers
model3.add(Dense(64, activation='relu', kernel_initializer='he_uniform'))
model3.add(BatchNormalization())

model3.add(Dense(32, activation='relu', kernel_initializer='he_uniform'))

# Add output layer
model3.add(Dense(1, activation='sigmoid'))
In [101]:
model3.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense (Dense)                        │ (None, 128)                 │             640 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization                  │ (None, 128)                 │             512 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 64)                  │           8,256 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_1                │ (None, 64)                  │             256 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_2 (Dense)                      │ (None, 32)                  │           2,080 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_3 (Dense)                      │ (None, 1)                   │              33 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 11,777 (46.00 KB)
 Trainable params: 11,393 (44.50 KB)
 Non-trainable params: 384 (1.50 KB)
In [102]:
optimizer = tf.keras.optimizers.Adam(0.001)
model3.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
In [103]:
history_3 = model3.fit(x_train,y_train,batch_size=64,epochs=50,verbose=1,validation_split = 0.2)
Epoch 1/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.6675 - loss: 0.5824 - val_accuracy: 0.7094 - val_loss: 0.5064
Epoch 2/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7257 - loss: 0.4941 - val_accuracy: 0.7251 - val_loss: 0.4895
Epoch 3/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7312 - loss: 0.4878 - val_accuracy: 0.7147 - val_loss: 0.4834
Epoch 4/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7340 - loss: 0.4820 - val_accuracy: 0.7294 - val_loss: 0.4917
Epoch 5/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7396 - loss: 0.4764 - val_accuracy: 0.7216 - val_loss: 0.4750
Epoch 6/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7439 - loss: 0.4720 - val_accuracy: 0.7168 - val_loss: 0.4944
Epoch 7/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7464 - loss: 0.4662 - val_accuracy: 0.7218 - val_loss: 0.4799
Epoch 8/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7475 - loss: 0.4625 - val_accuracy: 0.7102 - val_loss: 0.4901
Epoch 9/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7502 - loss: 0.4587 - val_accuracy: 0.7656 - val_loss: 0.4495
Epoch 10/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7546 - loss: 0.4526 - val_accuracy: 0.7459 - val_loss: 0.4594
Epoch 11/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7580 - loss: 0.4495 - val_accuracy: 0.7483 - val_loss: 0.4525
Epoch 12/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7637 - loss: 0.4458 - val_accuracy: 0.7662 - val_loss: 0.4376
Epoch 13/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7591 - loss: 0.4488 - val_accuracy: 0.7442 - val_loss: 0.4698
Epoch 14/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7643 - loss: 0.4455 - val_accuracy: 0.7522 - val_loss: 0.4567
Epoch 15/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7634 - loss: 0.4430 - val_accuracy: 0.7550 - val_loss: 0.4420
Epoch 16/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7695 - loss: 0.4376 - val_accuracy: 0.7689 - val_loss: 0.4322
Epoch 17/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7679 - loss: 0.4354 - val_accuracy: 0.7628 - val_loss: 0.4334
Epoch 18/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7737 - loss: 0.4320 - val_accuracy: 0.7679 - val_loss: 0.4209
Epoch 19/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7740 - loss: 0.4296 - val_accuracy: 0.7418 - val_loss: 0.4384
Epoch 20/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7750 - loss: 0.4262 - val_accuracy: 0.7605 - val_loss: 0.4325
Epoch 21/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7781 - loss: 0.4236 - val_accuracy: 0.7838 - val_loss: 0.4217
Epoch 22/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7814 - loss: 0.4202 - val_accuracy: 0.7742 - val_loss: 0.4342
Epoch 23/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7844 - loss: 0.4170 - val_accuracy: 0.7950 - val_loss: 0.4134
Epoch 24/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7866 - loss: 0.4136 - val_accuracy: 0.7950 - val_loss: 0.4126
Epoch 25/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7911 - loss: 0.4102 - val_accuracy: 0.7809 - val_loss: 0.4179
Epoch 26/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7922 - loss: 0.4083 - val_accuracy: 0.7644 - val_loss: 0.4313
Epoch 27/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7902 - loss: 0.4053 - val_accuracy: 0.7791 - val_loss: 0.4043
Epoch 28/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7921 - loss: 0.4039 - val_accuracy: 0.7723 - val_loss: 0.4156
Epoch 29/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7951 - loss: 0.3972 - val_accuracy: 0.7976 - val_loss: 0.3944
Epoch 30/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7966 - loss: 0.3933 - val_accuracy: 0.7890 - val_loss: 0.3979
Epoch 31/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8040 - loss: 0.3881 - val_accuracy: 0.8021 - val_loss: 0.3959
Epoch 32/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8035 - loss: 0.3848 - val_accuracy: 0.7907 - val_loss: 0.3950
Epoch 33/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8062 - loss: 0.3810 - val_accuracy: 0.7982 - val_loss: 0.4026
Epoch 34/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8094 - loss: 0.3777 - val_accuracy: 0.7848 - val_loss: 0.3993
Epoch 35/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8122 - loss: 0.3729 - val_accuracy: 0.7884 - val_loss: 0.4227
Epoch 36/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8129 - loss: 0.3730 - val_accuracy: 0.8157 - val_loss: 0.3769
Epoch 37/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8133 - loss: 0.3687 - val_accuracy: 0.7970 - val_loss: 0.3980
Epoch 38/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8155 - loss: 0.3662 - val_accuracy: 0.8059 - val_loss: 0.3811
Epoch 39/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.8168 - loss: 0.3638 - val_accuracy: 0.7937 - val_loss: 0.3938
Epoch 40/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8209 - loss: 0.3585 - val_accuracy: 0.8161 - val_loss: 0.3742
Epoch 41/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8170 - loss: 0.3601 - val_accuracy: 0.7899 - val_loss: 0.3968
Epoch 42/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8224 - loss: 0.3568 - val_accuracy: 0.8206 - val_loss: 0.3842
Epoch 43/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8274 - loss: 0.3492 - val_accuracy: 0.8009 - val_loss: 0.4058
Epoch 44/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8244 - loss: 0.3533 - val_accuracy: 0.8090 - val_loss: 0.3976
Epoch 45/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8255 - loss: 0.3506 - val_accuracy: 0.8104 - val_loss: 0.3868
Epoch 46/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8310 - loss: 0.3461 - val_accuracy: 0.8137 - val_loss: 0.3782
Epoch 47/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8293 - loss: 0.3442 - val_accuracy: 0.7909 - val_loss: 0.3986
Epoch 48/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8322 - loss: 0.3417 - val_accuracy: 0.8015 - val_loss: 0.3729
Epoch 49/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8334 - loss: 0.3415 - val_accuracy: 0.7833 - val_loss: 0.4363
Epoch 50/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8335 - loss: 0.3433 - val_accuracy: 0.8318 - val_loss: 0.3608
In [104]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_3.history['loss'])
plt.plot(history_3.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
  • Training loss decreases steadily, indicating the model is learning.
  • Validation loss fluctuates but follows a downward trend, suggesting generalization.
  • Slight overfitting: Validation loss is higher and more unstable than training loss

Receiver Operating Characteristic (ROC) curve

In [105]:
from sklearn.metrics import roc_curve

from matplotlib import pyplot


# predict probabilities
yhat2 = model3.predict(x_test)
# keep probabilities for the positive outcome only
yhat2 = yhat2[:, 0]
# calculate roc curves
fpr, tpr, thresholds2 = roc_curve(y_test, yhat2)
# calculate the g-mean for each threshold
gmeans2 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans2)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds2[ix], gmeans2[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Change Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
Best Threshold=0.209728, G-Mean=0.797
No description has been provided for this image
  • The ROC curve shows the trade-off between True Positive Rate (TPR) and False Positive Rate (FPR).
  • The model performs better than random guessing (diagonal line), with a G-Mean of 0.797 at the best threshold (~0.21).
  • A higher curve indicates better classification ability.
  • However, performance may still need improvement if False Positives or False Negatives are critical.
In [106]:
y_pred_e2=model3.predict(x_test)
y_pred_e2 = (y_pred_e2 > thresholds2[ix])
y_pred_e2
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Out[106]:
array([[False],
       [ True],
       [False],
       ...,
       [ True],
       [False],
       [False]])
In [107]:
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm2=confusion_matrix(y_test, y_pred_e2)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Patient Liver']
make_confusion_matrix(cm2,
                      group_names=labels,
                      categories=categories,
                      cmap='Blues')
No description has been provided for this image
In [108]:
#Accuracy as per the classification report
from sklearn import metrics
cr2=metrics.classification_report(y_test,y_pred_e2)
print(cr2)
              precision    recall  f1-score   support

           0       0.96      0.69      0.80      4384
           1       0.54      0.92      0.68      1755

    accuracy                           0.76      6139
   macro avg       0.75      0.81      0.74      6139
weighted avg       0.84      0.76      0.77      6139

  • The confusion matrix shows that the model correctly predicts 49.14% (3017) True Negatives and 26.37% (1619) True Positives.
  • However, 22.27% (1367) are False Positives, meaning many were incorrectly classified as positive. 2.22% (136) are False Negatives, which is relatively low. If False Positives are costly, improving precision is necessary.
  • The classification report shows an overall accuracy of 76%. Class 0 (Non-Liver Patients) has high precision (96%) but lower recall (69%), meaning it identifies most negatives but misses some positives.
  • Class 1 (Liver Patients) has moderate precision (54%) but high recall (92%), meaning it catches most positive cases but misclassifies some negatives.
  • The model prioritizes recall for Liver Patients, reducing False Negatives, which is crucial in medical applications.

Model 4 - Dropout technique¶

In [109]:
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
In [110]:
# Import necessary modules
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Initialize the ANN
model4 = Sequential()

# Add input layer with first hidden layer
model4.add(Dense(256, activation='relu', input_dim=x_train.shape[1]))
model4.add(Dropout(0.2))

# Add more hidden layers with Dropout for regularization
model4.add(Dense(128, activation='relu'))
model4.add(Dropout(0.2))

model4.add(Dense(64, activation='relu'))
model4.add(Dropout(0.2))

model4.add(Dense(32, activation='relu'))

# Add output layer
model4.add(Dense(1, activation='sigmoid'))
In [111]:
model4.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense (Dense)                        │ (None, 256)                 │           1,280 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout (Dropout)                    │ (None, 256)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 128)                 │          32,896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_1 (Dropout)                  │ (None, 128)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_2 (Dense)                      │ (None, 64)                  │           8,256 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_2 (Dropout)                  │ (None, 64)                  │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_3 (Dense)                      │ (None, 32)                  │           2,080 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_4 (Dense)                      │ (None, 1)                   │              33 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 44,545 (174.00 KB)
 Trainable params: 44,545 (174.00 KB)
 Non-trainable params: 0 (0.00 B)
In [112]:
optimizer = tf.keras.optimizers.Adam(0.001)
model4.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
In [113]:
history_4 = model4.fit(x_train,y_train,batch_size=64,epochs=50,verbose=1,validation_split = 0.2)
Epoch 1/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6440 - loss: 1.7209 - val_accuracy: 0.7052 - val_loss: 0.5486
Epoch 2/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7024 - loss: 0.5628 - val_accuracy: 0.7052 - val_loss: 0.5407
Epoch 3/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.7129 - loss: 0.5472 - val_accuracy: 0.7052 - val_loss: 0.5389
Epoch 4/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7123 - loss: 0.5441 - val_accuracy: 0.7052 - val_loss: 0.5342
Epoch 5/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7133 - loss: 0.5384 - val_accuracy: 0.7052 - val_loss: 0.5341
Epoch 6/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7134 - loss: 0.5345 - val_accuracy: 0.7035 - val_loss: 0.5308
Epoch 7/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7133 - loss: 0.5338 - val_accuracy: 0.7013 - val_loss: 0.5269
Epoch 8/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.7143 - loss: 0.5277 - val_accuracy: 0.7029 - val_loss: 0.5223
Epoch 9/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7158 - loss: 0.5258 - val_accuracy: 0.6982 - val_loss: 0.5208
Epoch 10/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7134 - loss: 0.5221 - val_accuracy: 0.7015 - val_loss: 0.5166
Epoch 11/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7121 - loss: 0.5198 - val_accuracy: 0.7035 - val_loss: 0.5165
Epoch 12/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7155 - loss: 0.5183 - val_accuracy: 0.7015 - val_loss: 0.5142
Epoch 13/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.7140 - loss: 0.5160 - val_accuracy: 0.6990 - val_loss: 0.5152
Epoch 14/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7130 - loss: 0.5171 - val_accuracy: 0.7021 - val_loss: 0.5122
Epoch 15/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7154 - loss: 0.5139 - val_accuracy: 0.7015 - val_loss: 0.5082
Epoch 16/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7150 - loss: 0.5127 - val_accuracy: 0.7013 - val_loss: 0.5127
Epoch 17/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7145 - loss: 0.5075 - val_accuracy: 0.7015 - val_loss: 0.5121
Epoch 18/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7155 - loss: 0.5070 - val_accuracy: 0.7098 - val_loss: 0.5045
Epoch 19/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7153 - loss: 0.5073 - val_accuracy: 0.7070 - val_loss: 0.5032
Epoch 20/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.7157 - loss: 0.5018 - val_accuracy: 0.7100 - val_loss: 0.5042
Epoch 21/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7162 - loss: 0.5033 - val_accuracy: 0.7021 - val_loss: 0.4965
Epoch 22/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7172 - loss: 0.5038 - val_accuracy: 0.7231 - val_loss: 0.4980
Epoch 23/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7168 - loss: 0.4998 - val_accuracy: 0.7076 - val_loss: 0.4982
Epoch 24/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7189 - loss: 0.5001 - val_accuracy: 0.7019 - val_loss: 0.4945
Epoch 25/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7175 - loss: 0.4986 - val_accuracy: 0.7035 - val_loss: 0.4936
Epoch 26/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7196 - loss: 0.5008 - val_accuracy: 0.7263 - val_loss: 0.4963
Epoch 27/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7234 - loss: 0.4973 - val_accuracy: 0.7347 - val_loss: 0.4896
Epoch 28/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7256 - loss: 0.4975 - val_accuracy: 0.7243 - val_loss: 0.4898
Epoch 29/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.7298 - loss: 0.4924 - val_accuracy: 0.7276 - val_loss: 0.4898
Epoch 30/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7270 - loss: 0.4941 - val_accuracy: 0.7345 - val_loss: 0.4932
Epoch 31/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7310 - loss: 0.4942 - val_accuracy: 0.7365 - val_loss: 0.4874
Epoch 32/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.7337 - loss: 0.4912 - val_accuracy: 0.7418 - val_loss: 0.4853
Epoch 33/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7326 - loss: 0.4901 - val_accuracy: 0.7363 - val_loss: 0.4871
Epoch 34/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7359 - loss: 0.4886 - val_accuracy: 0.7322 - val_loss: 0.4844
Epoch 35/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7321 - loss: 0.4887 - val_accuracy: 0.7359 - val_loss: 0.4883
Epoch 36/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7364 - loss: 0.4870 - val_accuracy: 0.7355 - val_loss: 0.4775
Epoch 37/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7403 - loss: 0.4830 - val_accuracy: 0.7379 - val_loss: 0.4769
Epoch 38/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7398 - loss: 0.4833 - val_accuracy: 0.7445 - val_loss: 0.4746
Epoch 39/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.7422 - loss: 0.4802 - val_accuracy: 0.7434 - val_loss: 0.4794
Epoch 40/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7380 - loss: 0.4831 - val_accuracy: 0.7489 - val_loss: 0.4739
Epoch 41/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7389 - loss: 0.4819 - val_accuracy: 0.7398 - val_loss: 0.4760
Epoch 42/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7418 - loss: 0.4813 - val_accuracy: 0.7314 - val_loss: 0.4774
Epoch 43/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7392 - loss: 0.4788 - val_accuracy: 0.7508 - val_loss: 0.4738
Epoch 44/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7449 - loss: 0.4803 - val_accuracy: 0.7406 - val_loss: 0.4811
Epoch 45/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7443 - loss: 0.4809 - val_accuracy: 0.7485 - val_loss: 0.4681
Epoch 46/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7431 - loss: 0.4772 - val_accuracy: 0.7445 - val_loss: 0.4681
Epoch 47/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7468 - loss: 0.4739 - val_accuracy: 0.7451 - val_loss: 0.4765
Epoch 48/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7436 - loss: 0.4801 - val_accuracy: 0.7440 - val_loss: 0.4649
Epoch 49/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7485 - loss: 0.4709 - val_accuracy: 0.7455 - val_loss: 0.4727
Epoch 50/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7489 - loss: 0.4741 - val_accuracy: 0.7526 - val_loss: 0.4599
In [114]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_4.history['loss'])
plt.plot(history_4.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
  • The loss curve shows a sharp initial drop, stabilizing after a few epochs. - Training and validation losses decrease consistently, staying close, indicating good generalization with no severe overfitting.
  • The model continues learning, but improvements slow after ~10 epochs.
In [115]:
from sklearn.metrics import roc_curve

from matplotlib import pyplot


# predict probabilities
yhat3 = model4.predict(x_test)
# keep probabilities for the positive outcome only
yhat3 = yhat3[:, 0]
# calculate roc curves
fpr, tpr, thresholds3 = roc_curve(y_test, yhat3)
# calculate the g-mean for each threshold
gmeans3 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans3)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds3[ix], gmeans3[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step
Best Threshold=0.345194, G-Mean=0.728
No description has been provided for this image
  • The ROC curve indicates moderate model performance with a G-Mean of 0.728, balancing sensitivity and specificity.
  • The curve rises above the no-skill line, showing predictive power, but there's room for improvement.
  • The best threshold (0.345) suggests an optimal trade-off between false positives and false negatives.
In [116]:
y_pred_e3=model4.predict(x_test)
y_pred_e3 = (y_pred_e3 > thresholds3[ix])
y_pred_e3
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
Out[116]:
array([[False],
       [ True],
       [ True],
       ...,
       [ True],
       [False],
       [False]])
In [117]:
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm3=confusion_matrix(y_test, y_pred_e3)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm3,
                      group_names=labels,
                      categories=categories,
                      cmap='Blues')
No description has been provided for this image
In [118]:
#Accuracy as per the classification report
from sklearn import metrics
cr3=metrics.classification_report(y_test,y_pred_e3)
print(cr3)
              precision    recall  f1-score   support

           0       0.91      0.63      0.74      4384
           1       0.48      0.84      0.61      1755

    accuracy                           0.69      6139
   macro avg       0.69      0.74      0.68      6139
weighted avg       0.78      0.69      0.71      6139

  • True Negative (45.12%): Non-liver patients correctly identified.
  • False Positive (26.29%): Non-liver patients misclassified as liver patients, leading to unnecessary concern or treatment.
  • False Negative (4.63%): Liver patients misclassified as non-liver patients, posing a serious risk of missed diagnosis.
  • True Positive (23.96%): Liver patients correctly identified.
  • Recall for Class 1 (Liver Patient) is 0.84, meaning your model captures 84% of actual liver patients, reducing missed diagnoses.
  • Precision for Class 1 is 0.48, indicating many false positives.
  • Overall Accuracy: 69%, suggesting room for improvement.
  • False Negatives are low (4.63%), which is positive for healthcare applications where missing true cases is critical.

Model 5 - Random Search CV¶

  • Hyperparameters
    • Type of Architecture
    • Number of Layers
    • Number of Neurons in a layer
    • Regularization hyperparameters
    • Learning Rate
    • Type of Optimizer
    • Dropout Rate
In [119]:
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
In [120]:
def create_model_v4(input_dim):
    np.random.seed(1337)
    model5 = Sequential()
    model5.add(Dense(256, activation='relu', input_dim=input_dim))
    model5.add(Dropout(0.3))
    model5.add(Dense(128, activation='relu'))
    model5.add(Dense(64, activation='relu'))
    model5.add(Dense(32, activation='relu'))
    model5.add(Dense(1, activation='sigmoid'))
    optimizer = tf.keras.optimizers.Adam()
    model5.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model5
In [121]:
# Import necessary modules
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV

# Define Keras estimator
keras_estimator = KerasClassifier(build_fn=create_model_v4, input_dim=x_train.shape[1], optimizer="Adam", verbose=1)

# Define the grid search parameters
learn_rate = [0.01, 0.1, 0.001]
batch_size = [32, 64, 128]
param_random = dict(optimizer__learning_rate=learn_rate, batch_size=batch_size)

kfold_splits = 3
random = RandomizedSearchCV(estimator=keras_estimator,
                            verbose=1,
                            cv=kfold_splits,
                            param_distributions=param_random,
                            n_jobs=-1)
In [122]:
random_result = random.fit(x_train, y_train,validation_split=0.2,verbose=1)

# Summarize results
print("Best: %f using %s" % (random_result.best_score_, random_result.best_params_))
means = random_result.cv_results_['mean_test_score']
stds = random_result.cv_results_['std_test_score']
params = random_result.cv_results_['params']
Fitting 3 folds for each of 9 candidates, totalling 27 fits
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6692 - loss: 1.2555 - val_accuracy: 0.7255 - val_loss: 0.5643
Best: 0.714117 using {'optimizer__learning_rate': 0.1, 'batch_size': 32}
In [123]:
estimator_v4 = create_model_v4(input_dim=x_train.shape[1])  # Pass input_dim explicitly
estimator_v4.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense_5 (Dense)                      │ (None, 256)                 │           1,280 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_1 (Dropout)                  │ (None, 256)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_6 (Dense)                      │ (None, 128)                 │          32,896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_7 (Dense)                      │ (None, 64)                  │           8,256 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_8 (Dense)                      │ (None, 32)                  │           2,080 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_9 (Dense)                      │ (None, 1)                   │              33 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 44,545 (174.00 KB)
 Trainable params: 44,545 (174.00 KB)
 Non-trainable params: 0 (0.00 B)
In [124]:
optimizer = tf.keras.optimizers.Adam()
estimator_v4.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
history_5 = estimator_v4.fit(x_train, y_train, epochs=50, batch_size = 32, verbose=1,validation_split=0.2)
Epoch 1/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6977 - loss: 0.6653 - val_accuracy: 0.7035 - val_loss: 0.5430
Epoch 2/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7171 - loss: 0.5324 - val_accuracy: 0.7223 - val_loss: 0.5267
Epoch 3/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7142 - loss: 0.5237 - val_accuracy: 0.7062 - val_loss: 0.5234
Epoch 4/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7133 - loss: 0.5147 - val_accuracy: 0.7221 - val_loss: 0.5162
Epoch 5/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7119 - loss: 0.5114 - val_accuracy: 0.7147 - val_loss: 0.5102
Epoch 6/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7151 - loss: 0.5072 - val_accuracy: 0.7119 - val_loss: 0.5063
Epoch 7/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.7214 - loss: 0.5039 - val_accuracy: 0.7151 - val_loss: 0.5028
Epoch 8/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7224 - loss: 0.5024 - val_accuracy: 0.7235 - val_loss: 0.5034
Epoch 9/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.7245 - loss: 0.5005 - val_accuracy: 0.7278 - val_loss: 0.4983
Epoch 10/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7280 - loss: 0.4983 - val_accuracy: 0.7328 - val_loss: 0.5001
Epoch 11/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7252 - loss: 0.4970 - val_accuracy: 0.7306 - val_loss: 0.4943
Epoch 12/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7278 - loss: 0.4955 - val_accuracy: 0.7286 - val_loss: 0.4960
Epoch 13/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7290 - loss: 0.4981 - val_accuracy: 0.7278 - val_loss: 0.4966
Epoch 14/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7299 - loss: 0.4950 - val_accuracy: 0.7308 - val_loss: 0.4940
Epoch 15/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7347 - loss: 0.4968 - val_accuracy: 0.7259 - val_loss: 0.4998
Epoch 16/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7372 - loss: 0.4937 - val_accuracy: 0.7432 - val_loss: 0.4926
Epoch 17/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7339 - loss: 0.4956 - val_accuracy: 0.7349 - val_loss: 0.4840
Epoch 18/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7341 - loss: 0.4909 - val_accuracy: 0.7381 - val_loss: 0.4838
Epoch 19/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7382 - loss: 0.4857 - val_accuracy: 0.7296 - val_loss: 0.4880
Epoch 20/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7355 - loss: 0.4883 - val_accuracy: 0.7333 - val_loss: 0.4846
Epoch 21/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7361 - loss: 0.4871 - val_accuracy: 0.7330 - val_loss: 0.4817
Epoch 22/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7439 - loss: 0.4843 - val_accuracy: 0.7267 - val_loss: 0.4827
Epoch 23/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7407 - loss: 0.4831 - val_accuracy: 0.7430 - val_loss: 0.4801
Epoch 24/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7403 - loss: 0.4849 - val_accuracy: 0.7371 - val_loss: 0.4799
Epoch 25/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7435 - loss: 0.4834 - val_accuracy: 0.7282 - val_loss: 0.4782
Epoch 26/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7428 - loss: 0.4816 - val_accuracy: 0.7337 - val_loss: 0.4859
Epoch 27/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7439 - loss: 0.4779 - val_accuracy: 0.7512 - val_loss: 0.4720
Epoch 28/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7461 - loss: 0.4783 - val_accuracy: 0.7206 - val_loss: 0.4742
Epoch 29/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7412 - loss: 0.4799 - val_accuracy: 0.7442 - val_loss: 0.4737
Epoch 30/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7497 - loss: 0.4767 - val_accuracy: 0.7465 - val_loss: 0.4789
Epoch 31/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7409 - loss: 0.4807 - val_accuracy: 0.7333 - val_loss: 0.4774
Epoch 32/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7468 - loss: 0.4759 - val_accuracy: 0.7379 - val_loss: 0.4639
Epoch 33/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7471 - loss: 0.4745 - val_accuracy: 0.7467 - val_loss: 0.4662
Epoch 34/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7465 - loss: 0.4747 - val_accuracy: 0.7408 - val_loss: 0.4762
Epoch 35/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7483 - loss: 0.4741 - val_accuracy: 0.7561 - val_loss: 0.4601
Epoch 36/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7502 - loss: 0.4708 - val_accuracy: 0.7420 - val_loss: 0.4721
Epoch 37/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7449 - loss: 0.4760 - val_accuracy: 0.7497 - val_loss: 0.4637
Epoch 38/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 3ms/step - accuracy: 0.7471 - loss: 0.4718 - val_accuracy: 0.7483 - val_loss: 0.4597
Epoch 39/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7538 - loss: 0.4684 - val_accuracy: 0.7557 - val_loss: 0.4611
Epoch 40/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7510 - loss: 0.4691 - val_accuracy: 0.7499 - val_loss: 0.4529
Epoch 41/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7497 - loss: 0.4692 - val_accuracy: 0.7491 - val_loss: 0.4628
Epoch 42/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7539 - loss: 0.4678 - val_accuracy: 0.7550 - val_loss: 0.4490
Epoch 43/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7481 - loss: 0.4665 - val_accuracy: 0.7597 - val_loss: 0.4488
Epoch 44/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7538 - loss: 0.4617 - val_accuracy: 0.7516 - val_loss: 0.4496
Epoch 45/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7572 - loss: 0.4587 - val_accuracy: 0.7402 - val_loss: 0.4526
Epoch 46/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7530 - loss: 0.4604 - val_accuracy: 0.7585 - val_loss: 0.4442
Epoch 47/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7551 - loss: 0.4602 - val_accuracy: 0.7483 - val_loss: 0.4472
Epoch 48/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7483 - loss: 0.4627 - val_accuracy: 0.7557 - val_loss: 0.4465
Epoch 49/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7530 - loss: 0.4599 - val_accuracy: 0.7567 - val_loss: 0.4458
Epoch 50/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7560 - loss: 0.4562 - val_accuracy: 0.7550 - val_loss: 0.4411
In [125]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_5.history['loss'])
plt.plot(history_5.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
  • The model loss graph shows a steady decline in both training and validation loss over epochs, indicating effective learning.
  • The validation loss remains close to the training loss, suggesting minimal overfitting.
  • The downward trend implies the model is improving, but further tuning might reduce the gap for better generalization.
In [126]:
from sklearn.metrics import roc_curve

from matplotlib import pyplot

# predict probabilities
yhat4 = estimator_v4.predict(x_test)
# keep probabilities for the positive outcome only
yhat4 = yhat4[:, 0]
# calculate roc curves
fpr, tpr, thresholds4 = roc_curve(y_test, yhat4)
# calculate the g-mean for each threshold
gmeans4 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans4)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds4[ix], gmeans4[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step
Best Threshold=0.313894, G-Mean=0.735
No description has been provided for this image
  • The curve is above the diagonal "No Skill" line, indicating predictive power.
  • The best threshold (0.325) balances sensitivity and specificity, with a G-Mean of 0.748, suggesting reasonable but improvable performance.
In [127]:
y_pred_e4=estimator_v4.predict(x_test)
y_pred_e4 = (y_pred_e4 > thresholds4[ix])
y_pred_e4
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
Out[127]:
array([[False],
       [ True],
       [False],
       ...,
       [ True],
       [False],
       [False]])
In [128]:
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm4=confusion_matrix(y_test, y_pred_e4)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm4,
                      group_names=labels,
                      categories=categories,
                      cmap='Blues')
No description has been provided for this image
In [129]:
#Accuracy as per the classification report
from sklearn import metrics
cr4=metrics.classification_report(y_test,y_pred_e4)
print(cr4)
              precision    recall  f1-score   support

           0       0.92      0.63      0.74      4384
           1       0.48      0.86      0.61      1755

    accuracy                           0.69      6139
   macro avg       0.70      0.74      0.68      6139
weighted avg       0.79      0.69      0.71      6139

  • The confusion matrix and classification report show an overall accuracy of 72%, with a better balance between recall for both classes.
    • True Negative (48.05%): Correctly identified non-liver patients.
    • True Positive (23.73%): Correctly identified liver patients.
    • False Positive (23.36%): Non-liver patients misclassified as liver patients.
    • False Negative (4.85%): Liver patients misclassified as non-liver patients.
  • While recall for liver patients (83%) has improved, precision (50%) remains low, indicating a high number of false positives. This suggests the model favors detecting liver disease at the cost of some misclassifications, which may be acceptable in medical diagnosis to ensure fewer actual liver patients are missed.

Model 6 - Grid Search CV¶

  • Parameters
    • Type of Architecture
    • Number of Layers
    • Number of Neurons in a layer
    • Regularization hyperparameters
    • Learning Rate
    • Type of Optimizer
    • Dropout Rate
In [130]:
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
In [131]:
def create_model_v5():
    np.random.seed(1337)
    model6 = Sequential()
    model6.add(Dense(256,activation='relu',input_dim = x_train.shape[1]))
    model6.add(Dropout(0.3))
    #model6.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
    model6.add(Dense(128,activation='relu'))
    model6.add(Dropout(0.3))
    model6.add(Dense(64,activation='relu'))
    model6.add(Dropout(0.2))
    #model6.add(Dense(32,activation='relu',kernel_initializer='he_uniform'))
    #model6.add(Dropout(0.3))
    model6.add(Dense(32,activation='relu'))
    model6.add(Dense(1, activation='sigmoid'))

    #compile model
    optimizer = tf.keras.optimizers.Adam()
    model6.compile(optimizer = optimizer,loss = 'binary_crossentropy', metrics = ['accuracy'])
    return model6
In [132]:
# Import necessary modules
from sklearn.model_selection import GridSearchCV
from scikeras.wrappers import KerasClassifier

# Define Keras estimator
keras_estimator = KerasClassifier(build_fn=create_model_v4, input_dim=x_train.shape[1], optimizer="Adam", verbose=1)

# Define the grid search parameters
learn_rate = [0.01, 0.1, 0.001]
batch_size = [32, 64, 128]
param_grid = dict(optimizer__learning_rate=learn_rate, batch_size=batch_size)

kfold_splits = 3
grid = GridSearchCV(estimator=keras_estimator,
                    verbose=1,
                    cv=kfold_splits,
                    param_grid=param_grid,
                    n_jobs=-1)
In [133]:
import time

# store starting time
begin = time.time()


grid_result = grid.fit(x_train, y_train, verbose=1)

# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']

time.sleep(1)
# store end time
end = time.time()

# total time taken
print(f"Total runtime of the program is {end - begin}")
Fitting 3 folds for each of 9 candidates, totalling 27 fits
192/192 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.6627 - loss: 2.0261
Best: 0.716153 using {'batch_size': 128, 'optimizer__learning_rate': 0.01}
Total runtime of the program is 113.1963906288147
In [134]:
estimator_v5=create_model_v5()

estimator_v5.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense_5 (Dense)                      │ (None, 256)                 │           1,280 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_1 (Dropout)                  │ (None, 256)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_6 (Dense)                      │ (None, 128)                 │          32,896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_2 (Dropout)                  │ (None, 128)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_7 (Dense)                      │ (None, 64)                  │           8,256 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_3 (Dropout)                  │ (None, 64)                  │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_8 (Dense)                      │ (None, 32)                  │           2,080 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_9 (Dense)                      │ (None, 1)                   │              33 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 44,545 (174.00 KB)
 Trainable params: 44,545 (174.00 KB)
 Non-trainable params: 0 (0.00 B)
In [135]:
optimizer = tf.keras.optimizers.Adam(grid_result.best_params_['optimizer__learning_rate'])
estimator_v5.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
history_6=estimator_v5.fit(x_train, y_train, epochs=50, batch_size = 32, verbose=1,validation_split=0.2)
Epoch 1/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6876 - loss: 1.1544 - val_accuracy: 0.7052 - val_loss: 0.5272
Epoch 2/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7147 - loss: 0.5382 - val_accuracy: 0.7052 - val_loss: 0.5469
Epoch 3/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5338 - val_accuracy: 0.7052 - val_loss: 0.5363
Epoch 4/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7160 - loss: 0.5345 - val_accuracy: 0.7052 - val_loss: 0.5368
Epoch 5/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7163 - loss: 0.5301 - val_accuracy: 0.7052 - val_loss: 0.5370
Epoch 6/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5345 - val_accuracy: 0.7052 - val_loss: 0.5499
Epoch 7/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5327 - val_accuracy: 0.7052 - val_loss: 0.5558
Epoch 8/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5356 - val_accuracy: 0.7052 - val_loss: 0.5462
Epoch 9/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7161 - loss: 0.5379 - val_accuracy: 0.7052 - val_loss: 0.5586
Epoch 10/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7161 - loss: 0.5653 - val_accuracy: 0.7052 - val_loss: 0.5620
Epoch 11/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5490 - val_accuracy: 0.7052 - val_loss: 0.5552
Epoch 12/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5495 - val_accuracy: 0.7052 - val_loss: 0.5292
Epoch 13/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7163 - loss: 0.5357 - val_accuracy: 0.7052 - val_loss: 0.5268
Epoch 14/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7156 - loss: 0.5379 - val_accuracy: 0.7052 - val_loss: 0.5308
Epoch 15/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7160 - loss: 0.5440 - val_accuracy: 0.7052 - val_loss: 0.5300
Epoch 16/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5350 - val_accuracy: 0.7052 - val_loss: 0.5164
Epoch 17/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5387 - val_accuracy: 0.7052 - val_loss: 0.5182
Epoch 18/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7161 - loss: 0.5352 - val_accuracy: 0.7052 - val_loss: 0.5480
Epoch 19/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7161 - loss: 0.5437 - val_accuracy: 0.7052 - val_loss: 0.5279
Epoch 20/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7162 - loss: 0.5358 - val_accuracy: 0.7052 - val_loss: 0.5642
Epoch 21/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5567 - val_accuracy: 0.7052 - val_loss: 0.5521
Epoch 22/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7161 - loss: 0.5640 - val_accuracy: 0.7052 - val_loss: 0.5535
Epoch 23/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7162 - loss: 0.5538 - val_accuracy: 0.7052 - val_loss: 0.5551
Epoch 24/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7162 - loss: 0.5667 - val_accuracy: 0.7052 - val_loss: 0.5678
Epoch 25/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5676 - val_accuracy: 0.7052 - val_loss: 0.5571
Epoch 26/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5822 - val_accuracy: 0.7052 - val_loss: 0.5763
Epoch 27/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7162 - loss: 0.5697 - val_accuracy: 0.7052 - val_loss: 0.5969
Epoch 28/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5856 - val_accuracy: 0.7052 - val_loss: 0.5822
Epoch 29/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5815 - val_accuracy: 0.7052 - val_loss: 0.5603
Epoch 30/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5753 - val_accuracy: 0.7052 - val_loss: 0.5835
Epoch 31/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5774 - val_accuracy: 0.7052 - val_loss: 0.5989
Epoch 32/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7162 - loss: 0.5914 - val_accuracy: 0.7052 - val_loss: 0.5998
Epoch 33/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5875 - val_accuracy: 0.7052 - val_loss: 0.5933
Epoch 34/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5810 - val_accuracy: 0.7052 - val_loss: 0.5982
Epoch 35/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5861 - val_accuracy: 0.7052 - val_loss: 0.5906
Epoch 36/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5832 - val_accuracy: 0.7052 - val_loss: 0.5923
Epoch 37/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5846 - val_accuracy: 0.7052 - val_loss: 0.6019
Epoch 38/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7162 - loss: 0.5910 - val_accuracy: 0.7052 - val_loss: 0.6019
Epoch 39/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7162 - loss: 0.5910 - val_accuracy: 0.7052 - val_loss: 0.6019
Epoch 40/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5909 - val_accuracy: 0.7052 - val_loss: 0.6019
Epoch 41/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5908 - val_accuracy: 0.7052 - val_loss: 0.6019
Epoch 42/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7162 - loss: 0.5906 - val_accuracy: 0.7052 - val_loss: 0.6020
Epoch 43/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7162 - loss: 0.5907 - val_accuracy: 0.7052 - val_loss: 0.6019
Epoch 44/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5906 - val_accuracy: 0.7052 - val_loss: 0.6020
Epoch 45/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5909 - val_accuracy: 0.7052 - val_loss: 0.6020
Epoch 46/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.6032 - val_accuracy: 0.7052 - val_loss: 0.6053
Epoch 47/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7162 - loss: 0.5921 - val_accuracy: 0.7052 - val_loss: 0.6032
Epoch 48/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5915 - val_accuracy: 0.7052 - val_loss: 0.6031
Epoch 49/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5916 - val_accuracy: 0.7052 - val_loss: 0.6030
Epoch 50/50
614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5915 - val_accuracy: 0.7052 - val_loss: 0.6030
In [136]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_6.history['loss'])
plt.plot(history_6.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
  • The training and validation loss decrease steadily, indicating effective learning.
  • Both curves closely follow each other, suggesting minimal overfitting. The model stabilizes after ~10 epochs, with a final loss around 0.48, implying decent but improvable performance.
In [137]:
from sklearn.metrics import roc_curve

from matplotlib import pyplot

# predict probabilities
yhat5 = estimator_v5.predict(x_test)
# keep probabilities for the positive outcome only
yhat5 = yhat5[:, 0]
# calculate roc curves
fpr, tpr, thresholds5 = roc_curve(y_test, yhat5)
# calculate the g-mean for each threshold
gmeans5 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans5)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds5[ix], gmeans5[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
Best Threshold=0.280138, G-Mean=0.160
No description has been provided for this image
  • Poor model performance: The ROC curve closely follows the diagonal, indicating near-random classification.
  • Low G-Mean (0.160): Suggests an imbalanced sensitivity and specificity.
  • Threshold ineffective: The best threshold does not improve class separation significantly.
  • Needs improvement: Consider hyperparameter tuning and addressing class imbalance.
In [138]:
y_pred_e5=estimator_v5.predict(x_test)
y_pred_e5 = (y_pred_e5 > thresholds5[ix])
y_pred_e5
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
Out[138]:
array([[False],
       [False],
       [False],
       ...,
       [ True],
       [ True],
       [ True]])
In [139]:
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm5=confusion_matrix(y_test, y_pred_e5)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm5,
                      group_names=labels,
                      categories=categories,
                      cmap='Blues')
No description has been provided for this image
In [140]:
#Accuracy as per the classification report
from sklearn import metrics
cr5=metrics.classification_report(y_test,y_pred_e5)
print(cr5)
              precision    recall  f1-score   support

           0       0.71      1.00      0.83      4384
           1       0.00      0.00      0.00      1755

    accuracy                           0.71      6139
   macro avg       0.36      0.50      0.42      6139
weighted avg       0.51      0.71      0.59      6139

  • Severe class imbalance in predictions: The model predicts nearly all cases as "Non-Liver Patient," failing to identify any "Liver Patients."
  • Poor recall & precision for class 1: The recall and precision for the "Liver Patient" class are both 0.00, meaning the model does not detect any actual positive cases.
  • High accuracy but misleading: The model achieves 71% accuracy, but this is due to predicting the majority class (Non-Liver Patient) rather than true predictive power.
  • Macro & weighted averages are low: The macro average F1-score is 0.42, indicating poor performance in distinguishing between classes.
  • Urgent need for resampling: Consider oversampling (e.g., SMOTE) or rebalancing the dataset to improve class 1 predictions.

Dask¶

  • There is also another library called Dask, sometimes used in the industry to provide a performance boost to Hyperparameter Tuning due to its parallelized computing procedure.
  • Dask also has the option of implementing Grid Search similar to the Grid Search in Scikit-learn.
In [141]:
# pip install dask==2024.12.1 dask-ml scikit-learn==1.2.2
In [142]:
import dask_ml
import dask
import sklearn

print(dask.__version__)
print(dask_ml.__version__)
print(sklearn.__version__)
2024.12.1
2024.4.4
1.4.2
In [143]:
# importing library
from dask_ml.model_selection import GridSearchCV as DaskGridSearchCV
In [144]:
def create_model_v6():
    np.random.seed(1337)
    model7 = Sequential()
    model7.add(Dense(256,activation='relu',input_dim = x_train.shape[1]))
    model7.add(Dropout(0.3))
    #model.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
    model7.add(Dense(128,activation='relu'))
    model7.add(Dropout(0.3))
    model7.add(Dense(64,activation='relu'))
    model7.add(Dropout(0.2))
    #model7.add(Dense(32,activation='relu',kernel_initializer='he_uniform'))
    #model7.add(Dropout(0.3))
    model7.add(Dense(32,activation='relu'))
    model7.add(Dense(1, activation='sigmoid'))

    #compile model
    optimizer = tf.keras.optimizers.Adam()
    model7.compile(optimizer = optimizer,loss = 'binary_crossentropy', metrics = ['accuracy'])
    return model7
In [145]:
keras_estimator = KerasClassifier(build_fn=create_model_v6, verbose=1)
# define the grid search parameters
learn_rate = [0.01, 0.1, 0.001]
batch_size = [32, 64, 128]
param_grid = dict(optimizer__learning_rate=learn_rate, batch_size=batch_size)

kfold_splits = 3
dask = DaskGridSearchCV(estimator=keras_estimator,
                    cv=kfold_splits,
                    param_grid=param_grid,n_jobs=-1)
In [146]:
import time

# store starting time
begin = time.time()

dask_result = dask.fit(x_train, y_train,validation_split=0.2,verbose=1)

# Summarize results
print("Best: %f using %s" % (dask_result.best_score_, dask_result.best_params_))
means = dask_result.cv_results_['mean_test_score']
stds = dask_result.cv_results_['std_test_score']
params = dask_result.cv_results_['params']

time.sleep(1)
# store end time
end = time.time()

# total time taken
print(f"Total runtime of the program is {end - begin}")
103/103 ━━━━━━━━━━━━━━━━━━━━ 7s 15ms/step - accuracy: 0.6442 - loss: 2.1003 - val_accuracy: 0.7086 - val_loss: 0.5592
103/103 ━━━━━━━━━━━━━━━━━━━━ 7s 14ms/step - accuracy: 0.6472 - loss: 1.8261 - val_accuracy: 0.7065 - val_loss: 0.5527
64/64 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
64/64 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step
410/410 ━━━━━━━━━━━━━━━━━━━━ 8s 13ms/step - accuracy: 0.6611 - loss: 1.3498 - val_accuracy: 0.7086 - val_loss: 0.5541
410/410 ━━━━━━━━━━━━━━━━━━━━ 7s 7ms/step - accuracy: 0.6608 - loss: 1.5886 - val_accuracy: 0.7086 - val_loss: 0.5540
256/256 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step
410/410 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.6337 - loss: 2.7663 - val_accuracy: 0.6970 - val_loss: 0.5926
256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step
103/103 ━━━━━━━━━━━━━━━━━━━━ 5s 20ms/step - accuracy: 0.6384 - loss: 2.2203 - val_accuracy: 0.6973 - val_loss: 0.5958
64/64 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step
205/205 ━━━━━━━━━━━━━━━━━━━━ 10s 15ms/step - accuracy: 0.6478 - loss: 1.5850 - val_accuracy: 0.7083 - val_loss: 0.5532
128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step
205/205 ━━━━━━━━━━━━━━━━━━━━ 8s 13ms/step - accuracy: 0.6542 - loss: 1.7590 - val_accuracy: 0.7083 - val_loss: 0.5894
205/205 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6449 - loss: 1.6088 - val_accuracy: 0.7083 - val_loss: 0.5647
128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step
128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step
103/103 ━━━━━━━━━━━━━━━━━━━━ 7s 15ms/step - accuracy: 0.6386 - loss: 2.6754 - val_accuracy: 0.7086 - val_loss: 0.5698
103/103 ━━━━━━━━━━━━━━━━━━━━ 6s 11ms/step - accuracy: 0.6544 - loss: 1.7676 - val_accuracy: 0.7083 - val_loss: 0.5713
64/64 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
64/64 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step
410/410 ━━━━━━━━━━━━━━━━━━━━ 8s 9ms/step - accuracy: 0.6523 - loss: 1.2845 - val_accuracy: 0.6991 - val_loss: 0.5829
410/410 ━━━━━━━━━━━━━━━━━━━━ 9s 12ms/step - accuracy: 0.6537 - loss: 1.0956 - val_accuracy: 0.7083 - val_loss: 0.5476
256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step
256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step
410/410 ━━━━━━━━━━━━━━━━━━━━ 8s 10ms/step - accuracy: 0.6631 - loss: 1.6120 - val_accuracy: 0.7181 - val_loss: 0.5913
103/103 ━━━━━━━━━━━━━━━━━━━━ 5s 15ms/step - accuracy: 0.6461 - loss: 1.7021 - val_accuracy: 0.7083 - val_loss: 0.5598
256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step
64/64 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step
205/205 ━━━━━━━━━━━━━━━━━━━━ 8s 14ms/step - accuracy: 0.6557 - loss: 1.7413 - val_accuracy: 0.7086 - val_loss: 0.5702
205/205 ━━━━━━━━━━━━━━━━━━━━ 7s 6ms/step - accuracy: 0.6432 - loss: 2.1345 - val_accuracy: 0.7059 - val_loss: 0.5771
128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step
128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step
205/205 ━━━━━━━━━━━━━━━━━━━━ 10s 18ms/step - accuracy: 0.6560 - loss: 1.4004 - val_accuracy: 0.7049 - val_loss: 0.5510
103/103 ━━━━━━━━━━━━━━━━━━━━ 9s 24ms/step - accuracy: 0.6586 - loss: 1.6383 - val_accuracy: 0.7083 - val_loss: 0.5582
64/64 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step
103/103 ━━━━━━━━━━━━━━━━━━━━ 5s 22ms/step - accuracy: 0.6580 - loss: 1.5695 - val_accuracy: 0.7083 - val_loss: 0.5941
64/64 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
410/410 ━━━━━━━━━━━━━━━━━━━━ 9s 14ms/step - accuracy: 0.6475 - loss: 1.3352 - val_accuracy: 0.6790 - val_loss: 0.5909
256/256 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step
410/410 ━━━━━━━━━━━━━━━━━━━━ 8s 9ms/step - accuracy: 0.6553 - loss: 1.4285 - val_accuracy: 0.7049 - val_loss: 0.5654
256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step
103/103 ━━━━━━━━━━━━━━━━━━━━ 5s 20ms/step - accuracy: 0.6196 - loss: 4.0593 - val_accuracy: 0.6836 - val_loss: 0.6159
64/64 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step
410/410 ━━━━━━━━━━━━━━━━━━━━ 8s 11ms/step - accuracy: 0.6448 - loss: 1.4892 - val_accuracy: 0.7083 - val_loss: 0.5516
256/256 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step
205/205 ━━━━━━━━━━━━━━━━━━━━ 7s 12ms/step - accuracy: 0.6533 - loss: 1.6598 - val_accuracy: 0.6002 - val_loss: 0.6119
128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step
205/205 ━━━━━━━━━━━━━━━━━━━━ 6s 12ms/step - accuracy: 0.6465 - loss: 1.4718 - val_accuracy: 0.7059 - val_loss: 0.5788
128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step
205/205 ━━━━━━━━━━━━━━━━━━━━ 6s 9ms/step - accuracy: 0.6384 - loss: 2.3445 - val_accuracy: 0.7080 - val_loss: 0.5602
128/128 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6435 - loss: 1.8018 - val_accuracy: 0.7052 - val_loss: 0.5565
Best: 0.718271 using {'batch_size': 32, 'optimizer__learning_rate': 0.01}
Total runtime of the program is 138.1254551410675

Model 8 - Keras Tuner¶

In [147]:
# ## Install Keras Tuner
# !pip install keras-tuner
In [154]:
# from tensorflow import keras
# from tensorflow.keras import layers
# from kerastuner.tuners import RandomSearch
In [149]:
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
  • Hyperparameters
    • How many hidden layers should the model have?
    • How many neurons should the model have in each hidden layer?
    • Learning Rate
In [150]:
def build_model(h):
    model8 = keras.Sequential()
    for i in range(h.Int('num_layers', 2, 10)):
        model8.add(layers.Dense(units=h.Int('units_' + str(i),
                                            min_value=32,
                                            max_value=256,
                                            step=32),
                               activation='relu'))
    model8.add(layers.Dense(1, activation='sigmoid'))
    model8.compile(
        optimizer=keras.optimizers.Adam(
            h.Choice('learning_rate', [1e-2, 1e-3, 1e-4])),
        loss='binary_crossentropy',
        metrics=['accuracy'])
    return model8

Initialize a tuner (here, RandomSearch). We use objective to specify the objective to select the best models, and we use max_trials to specify the number of different models to try.

In [157]:
!pip install keras-tuner --no-cache-dir
Collecting keras-tuner
  Downloading keras_tuner-1.4.7-py3-none-any.whl.metadata (5.4 kB)
Requirement already satisfied: keras in /usr/local/lib/python3.11/dist-packages (from keras-tuner) (3.8.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from keras-tuner) (24.2)
Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from keras-tuner) (2.32.3)
Collecting kt-legacy (from keras-tuner)
  Downloading kt_legacy-1.0.5-py3-none-any.whl.metadata (221 bytes)
Requirement already satisfied: absl-py in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (1.4.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (1.26.4)
Requirement already satisfied: rich in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (13.9.4)
Requirement already satisfied: namex in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (0.0.8)
Requirement already satisfied: h5py in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (3.12.1)
Requirement already satisfied: optree in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (0.14.1)
Requirement already satisfied: ml-dtypes in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (0.4.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (2025.1.31)
Requirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.11/dist-packages (from optree->keras->keras-tuner) (4.12.2)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich->keras->keras-tuner) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from rich->keras->keras-tuner) (2.18.0)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich->keras->keras-tuner) (0.1.2)
Downloading keras_tuner-1.4.7-py3-none-any.whl (129 kB)
Downloading kt_legacy-1.0.5-py3-none-any.whl (9.6 kB)
Installing collected packages: kt-legacy, keras-tuner
Successfully installed keras-tuner-1.4.7 kt-legacy-1.0.5
In [158]:
from keras_tuner import RandomSearch

tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=5,
    executions_per_trial=3,
    project_name='Job_'
)
In [159]:
tuner.search_space_summary()
Search space summary
Default search space size: 4
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 2, 'max_value': 10, 'step': 1, 'sampling': 'linear'}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 256, 'step': 32, 'sampling': 'linear'}
units_1 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 256, 'step': 32, 'sampling': 'linear'}
learning_rate (Choice)
{'default': 0.01, 'conditions': [], 'values': [0.01, 0.001, 0.0001], 'ordered': True}
In [160]:
### Searching the best model on X and y train
tuner.search(x_train, y_train,
             epochs=5,
             validation_split = 0.2)
Trial 5 Complete [00h 01m 31s]
val_accuracy: 0.7093599239985148

Best val_accuracy So Far: 0.7294509013493856
Total elapsed time: 00h 07m 34s
In [161]:
## Printing the best models with their hyperparameters
tuner.results_summary()
Results summary
Results in ./Job_
Showing 10 best trials
Objective(name="val_accuracy", direction="max")

Trial 1 summary
Hyperparameters:
num_layers: 5
units_0: 160
units_1: 160
learning_rate: 0.001
units_2: 224
units_3: 128
units_4: 224
units_5: 64
units_6: 160
units_7: 64
units_8: 32
Score: 0.7294509013493856

Trial 0 summary
Hyperparameters:
num_layers: 9
units_0: 224
units_1: 96
learning_rate: 0.001
units_2: 32
units_3: 32
units_4: 32
units_5: 32
units_6: 32
units_7: 32
units_8: 32
Score: 0.7245638966560364

Trial 3 summary
Hyperparameters:
num_layers: 5
units_0: 32
units_1: 64
learning_rate: 0.01
units_2: 96
units_3: 256
units_4: 256
units_5: 160
units_6: 192
units_7: 224
units_8: 224
Score: 0.7179121772448221

Trial 2 summary
Hyperparameters:
num_layers: 9
units_0: 192
units_1: 64
learning_rate: 0.001
units_2: 160
units_3: 32
units_4: 224
units_5: 32
units_6: 256
units_7: 96
units_8: 192
Score: 0.7177085280418396

Trial 4 summary
Hyperparameters:
num_layers: 10
units_0: 128
units_1: 32
learning_rate: 0.0001
units_2: 160
units_3: 160
units_4: 160
units_5: 224
units_6: 96
units_7: 128
units_8: 96
units_9: 32
Score: 0.7093599239985148

Create Keras Tuner

In [162]:
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
In [163]:
model8 = Sequential()
model8.add(Dense(160,activation='relu',kernel_initializer='he_uniform',input_dim = x_train.shape[1]))
model8.add(Dense(160,activation='relu',kernel_initializer='he_uniform'))
model8.add(Dense(224,activation='relu',kernel_initializer='he_uniform'))
model8.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
model8.add(Dense(224,activation='relu',kernel_initializer='he_uniform'))
model8.add(Dense(1, activation = 'sigmoid'))
In [164]:
model8.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense (Dense)                        │ (None, 160)                 │             800 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 160)                 │          25,760 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_2 (Dense)                      │ (None, 224)                 │          36,064 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_3 (Dense)                      │ (None, 128)                 │          28,800 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_4 (Dense)                      │ (None, 224)                 │          28,896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_5 (Dense)                      │ (None, 1)                   │             225 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 120,545 (470.88 KB)
 Trainable params: 120,545 (470.88 KB)
 Non-trainable params: 0 (0.00 B)
In [165]:
optimizer = tf.keras.optimizers.Adam(0.001)
model8.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
In [166]:
history_8 = model8.fit(x_train,y_train,batch_size=64,epochs=50,verbose=1,validation_split = 0.2)
Epoch 1/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.6335 - loss: 18.4642 - val_accuracy: 0.6915 - val_loss: 0.7759
Epoch 2/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 11ms/step - accuracy: 0.6597 - loss: 0.8441 - val_accuracy: 0.7033 - val_loss: 0.7323
Epoch 3/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6901 - loss: 0.6196 - val_accuracy: 0.7052 - val_loss: 0.8244
Epoch 4/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.6845 - loss: 0.6032 - val_accuracy: 0.7060 - val_loss: 0.5217
Epoch 5/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7033 - loss: 0.5489 - val_accuracy: 0.6579 - val_loss: 0.5630
Epoch 6/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7006 - loss: 0.5463 - val_accuracy: 0.7206 - val_loss: 0.5268
Epoch 7/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - accuracy: 0.6923 - loss: 0.5465 - val_accuracy: 0.6561 - val_loss: 0.5552
Epoch 8/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 11ms/step - accuracy: 0.6987 - loss: 0.5268 - val_accuracy: 0.7080 - val_loss: 0.5194
Epoch 9/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.6947 - loss: 0.5461 - val_accuracy: 0.7015 - val_loss: 0.5441
Epoch 10/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7005 - loss: 0.5404 - val_accuracy: 0.7031 - val_loss: 0.5813
Epoch 11/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7024 - loss: 0.5324 - val_accuracy: 0.5756 - val_loss: 0.6261
Epoch 12/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.6957 - loss: 0.5336 - val_accuracy: 0.6591 - val_loss: 0.5600
Epoch 13/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 10ms/step - accuracy: 0.7037 - loss: 0.5196 - val_accuracy: 0.6860 - val_loss: 0.5519
Epoch 14/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7047 - loss: 0.5270 - val_accuracy: 0.6966 - val_loss: 0.5301
Epoch 15/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7088 - loss: 0.5153 - val_accuracy: 0.6516 - val_loss: 0.5585
Epoch 16/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7051 - loss: 0.5204 - val_accuracy: 0.7001 - val_loss: 0.5078
Epoch 17/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7109 - loss: 0.5082 - val_accuracy: 0.7143 - val_loss: 0.5074
Epoch 18/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.7172 - loss: 0.5056 - val_accuracy: 0.7235 - val_loss: 0.5031
Epoch 19/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 10ms/step - accuracy: 0.7167 - loss: 0.5049 - val_accuracy: 0.7290 - val_loss: 0.5072
Epoch 20/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.7114 - loss: 0.5065 - val_accuracy: 0.6995 - val_loss: 0.5272
Epoch 21/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7151 - loss: 0.5109 - val_accuracy: 0.7001 - val_loss: 0.5666
Epoch 22/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7087 - loss: 0.5160 - val_accuracy: 0.7015 - val_loss: 0.5050
Epoch 23/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 10ms/step - accuracy: 0.7058 - loss: 0.5174 - val_accuracy: 0.7001 - val_loss: 0.5072
Epoch 24/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.7114 - loss: 0.5022 - val_accuracy: 0.6984 - val_loss: 0.5090
Epoch 25/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.7114 - loss: 0.5038 - val_accuracy: 0.7033 - val_loss: 0.5285
Epoch 26/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7120 - loss: 0.5175 - val_accuracy: 0.7031 - val_loss: 0.5202
Epoch 27/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7148 - loss: 0.5138 - val_accuracy: 0.7137 - val_loss: 0.5103
Epoch 28/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 11ms/step - accuracy: 0.7179 - loss: 0.5037 - val_accuracy: 0.7076 - val_loss: 0.5040
Epoch 29/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7154 - loss: 0.5032 - val_accuracy: 0.7082 - val_loss: 0.5097
Epoch 30/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7162 - loss: 0.5020 - val_accuracy: 0.7186 - val_loss: 0.5034
Epoch 31/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7171 - loss: 0.4992 - val_accuracy: 0.7267 - val_loss: 0.5170
Epoch 32/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7209 - loss: 0.5053 - val_accuracy: 0.7166 - val_loss: 0.5118
Epoch 33/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 10ms/step - accuracy: 0.7175 - loss: 0.4999 - val_accuracy: 0.7192 - val_loss: 0.4982
Epoch 34/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - accuracy: 0.7176 - loss: 0.5007 - val_accuracy: 0.7243 - val_loss: 0.4991
Epoch 35/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7225 - loss: 0.5011 - val_accuracy: 0.6984 - val_loss: 0.5176
Epoch 36/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7173 - loss: 0.5001 - val_accuracy: 0.7243 - val_loss: 0.4972
Epoch 37/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7262 - loss: 0.5007 - val_accuracy: 0.6416 - val_loss: 0.5675
Epoch 38/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.7078 - loss: 0.5098 - val_accuracy: 0.7233 - val_loss: 0.5030
Epoch 39/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 5s 7ms/step - accuracy: 0.7228 - loss: 0.4990 - val_accuracy: 0.7208 - val_loss: 0.4956
Epoch 40/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7247 - loss: 0.4975 - val_accuracy: 0.7227 - val_loss: 0.4969
Epoch 41/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7264 - loss: 0.4950 - val_accuracy: 0.7263 - val_loss: 0.4954
Epoch 42/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.7239 - loss: 0.4950 - val_accuracy: 0.7249 - val_loss: 0.5019
Epoch 43/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 11ms/step - accuracy: 0.7281 - loss: 0.4924 - val_accuracy: 0.7182 - val_loss: 0.4960
Epoch 44/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7272 - loss: 0.4944 - val_accuracy: 0.7166 - val_loss: 0.5015
Epoch 45/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7298 - loss: 0.4922 - val_accuracy: 0.7127 - val_loss: 0.5001
Epoch 46/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 10ms/step - accuracy: 0.7263 - loss: 0.4922 - val_accuracy: 0.7102 - val_loss: 0.5043
Epoch 47/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 6s 18ms/step - accuracy: 0.7265 - loss: 0.4908 - val_accuracy: 0.7137 - val_loss: 0.4998
Epoch 48/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 7s 7ms/step - accuracy: 0.7289 - loss: 0.4912 - val_accuracy: 0.7174 - val_loss: 0.5078
Epoch 49/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7303 - loss: 0.4909 - val_accuracy: 0.7086 - val_loss: 0.4987
Epoch 50/50
307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7284 - loss: 0.4905 - val_accuracy: 0.7269 - val_loss: 0.4911
In [167]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_8.history['loss'])
plt.plot(history_8.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
  • The model loss graph shows an extreme drop in training loss at the start, suggesting potential issues like overfitting or poor weight initialization.
  • The validation loss remains relatively stable, indicating the model generalizes well after the initial phase.
  • However, the significant difference at the start might imply overconfident predictions or an unstable training process.
  • Consider tuning hyperparameters like learning rate, dropout, or batch size for better stability.
In [168]:
from sklearn.metrics import roc_curve

from matplotlib import pyplot

# predict probabilities
yhat7 = model8.predict(x_test)
# keep probabilities for the positive outcome only
yhat7 = yhat7[:, 0]
# calculate roc curves
fpr, tpr, thresholds7 = roc_curve(y_test, yhat7)
# calculate the g-mean for each threshold
gmeans7 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans7)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds7[ix], gmeans7[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
Best Threshold=0.385339, G-Mean=0.717
No description has been provided for this image
  • ROC Curve Analysis: The model performs significantly better than random chance (dashed line), indicating good classification capability.
  • Best Threshold: Identified at 0.385339, optimizing the balance between True - Positive Rate (TPR) and False Positive Rate (FPR).
  • G-Mean: Achieved 0.717, suggesting a strong trade-off between sensitivity and specificity
In [169]:
y_pred_e7=model8.predict(x_test)
y_pred_e7 = (y_pred_e7 > thresholds7[ix])
y_pred_e7
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
Out[169]:
array([[False],
       [ True],
       [ True],
       ...,
       [ True],
       [False],
       [False]])
In [170]:
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm7=confusion_matrix(y_test, y_pred_e7)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm7,
                      group_names=labels,
                      categories=categories,
                      cmap='Blues')
No description has been provided for this image
In [171]:
#Accuracy as per the classification report
from sklearn import metrics
cr7=metrics.classification_report(y_test,y_pred_e7)
print(cr7)
              precision    recall  f1-score   support

           0       0.89      0.64      0.74      4384
           1       0.47      0.80      0.59      1755

    accuracy                           0.69      6139
   macro avg       0.68      0.72      0.67      6139
weighted avg       0.77      0.69      0.70      6139

  • Non-liver patients (0) have high precision (0.89) but lower recall (0.64), meaning the model correctly identifies most actual non-liver cases but misses some.
  • Liver patients (1) have low precision (0.47) but high recall (0.80), meaning the model captures more actual liver cases but also misclassifies many non-liver cases.
  • The overall accuracy is 69%, with macro F1-score of 0.67, indicating moderate balance. The weighted F1-score (0.70) reflects the class imbalance. The model prioritizes detecting liver disease but at the cost of higher false positives.

Model 9 - SMOTE + Keras Tuner¶

In [172]:
# !pip install --upgrade --force-reinstall scikit-learn imbalanced-learn
In [173]:
##Applying SMOTE on train and test
from imblearn.over_sampling import SMOTE
from imblearn.over_sampling import SMOTE
from imblearn.over_sampling import SMOTENC  # Alternative method for categorical data

import sklearn
print(sklearn.__version__)  # Check the installed version

smote=SMOTE(sampling_strategy='not majority')
X_sm , y_sm = smote.fit_resample(x_train,y_train)
1.4.2
In [174]:
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
In [175]:
def build_model_2(h):
    model9 = keras.Sequential()
    for i in range(h.Int('num_layers', 2, 10)):
        model9.add(layers.Dense(units=h.Int('units_' + str(i),
                                            min_value=32,
                                            max_value=256,
                                            step=32),
                               activation='relu'))
    model9.add(layers.Dense(1, activation='sigmoid'))
    model9.compile(
        optimizer=keras.optimizers.Adam(
            h.Choice('learning_rate', [1e-2, 1e-3, 1e-4])),
        loss='binary_crossentropy',
        metrics=['accuracy'])
    return model9
In [176]:
tuner_2 = RandomSearch(
    build_model_2,
    objective='val_accuracy',
    max_trials=5,
    executions_per_trial=3,
    project_name='Job_Switch')
In [177]:
tuner_2.search_space_summary()
Search space summary
Default search space size: 4
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 2, 'max_value': 10, 'step': 1, 'sampling': 'linear'}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 256, 'step': 32, 'sampling': 'linear'}
units_1 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 256, 'step': 32, 'sampling': 'linear'}
learning_rate (Choice)
{'default': 0.01, 'conditions': [], 'values': [0.01, 0.001, 0.0001], 'ordered': True}
In [178]:
tuner_2.search(X_sm, y_sm,
             epochs=5,
             validation_split = 0.2)
Trial 5 Complete [00h 01m 53s]
val_accuracy: 0.5544149776299795

Best val_accuracy So Far: 0.583737293879191
Total elapsed time: 00h 09m 03s
In [179]:
tuner_2.results_summary()
Results summary
Results in ./Job_Switch
Showing 10 best trials
Objective(name="val_accuracy", direction="max")

Trial 3 summary
Hyperparameters:
num_layers: 5
units_0: 32
units_1: 64
learning_rate: 0.01
units_2: 96
units_3: 256
units_4: 256
units_5: 160
units_6: 192
units_7: 224
units_8: 224
Score: 0.583737293879191

Trial 2 summary
Hyperparameters:
num_layers: 9
units_0: 192
units_1: 64
learning_rate: 0.001
units_2: 160
units_3: 32
units_4: 224
units_5: 32
units_6: 256
units_7: 96
units_8: 192
Score: 0.5826917688051859

Trial 4 summary
Hyperparameters:
num_layers: 10
units_0: 128
units_1: 32
learning_rate: 0.0001
units_2: 160
units_3: 160
units_4: 160
units_5: 224
units_6: 96
units_7: 128
units_8: 96
units_9: 32
Score: 0.5544149776299795

Trial 1 summary
Hyperparameters:
num_layers: 5
units_0: 160
units_1: 160
learning_rate: 0.001
units_2: 224
units_3: 128
units_4: 224
units_5: 64
units_6: 160
units_7: 64
units_8: 32
Score: 0.5472388664881388

Trial 0 summary
Hyperparameters:
num_layers: 9
units_0: 224
units_1: 96
learning_rate: 0.001
units_2: 32
units_3: 32
units_4: 32
units_5: 32
units_6: 32
units_7: 32
units_8: 32
Score: 0.5050850709279379
In [180]:
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
In [182]:
model9 = Sequential()
model9.add(Dense(160,activation='relu',kernel_initializer='he_uniform',input_dim = x_train.shape[1]))
model9.add(Dense(160,activation='relu',kernel_initializer='he_uniform'))
model9.add(Dense(224,activation='relu',kernel_initializer='he_uniform'))
model9.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
model9.add(Dense(224,activation='relu',kernel_initializer='he_uniform'))
model9.add(Dense(1, activation = 'sigmoid'))
      #Compiling the ANN with Adam optimizer and binary cross entropy loss function
optimizer = tf.keras.optimizers.Adam(0.001)
model9.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
In [183]:
model9.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense (Dense)                        │ (None, 160)                 │             800 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 160)                 │          25,760 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_2 (Dense)                      │ (None, 224)                 │          36,064 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_3 (Dense)                      │ (None, 128)                 │          28,800 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_4 (Dense)                      │ (None, 224)                 │          28,896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_5 (Dense)                      │ (None, 1)                   │             225 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 120,545 (470.88 KB)
 Trainable params: 120,545 (470.88 KB)
 Non-trainable params: 0 (0.00 B)
In [184]:
history_9 = model9.fit(X_sm,y_sm,batch_size=64,epochs=50,verbose=1,validation_split = 0.2)
Epoch 1/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 12s 12ms/step - accuracy: 0.5796 - loss: 9.5292 - val_accuracy: 0.0413 - val_loss: 1.6555
Epoch 2/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 8s 18ms/step - accuracy: 0.6332 - loss: 0.7373 - val_accuracy: 0.4256 - val_loss: 0.9889
Epoch 3/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 12ms/step - accuracy: 0.6364 - loss: 0.6541 - val_accuracy: 0.9011 - val_loss: 0.4919
Epoch 4/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 10ms/step - accuracy: 0.6426 - loss: 0.6220 - val_accuracy: 0.1459 - val_loss: 1.3278
Epoch 5/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 6s 14ms/step - accuracy: 0.6424 - loss: 0.6338 - val_accuracy: 0.4823 - val_loss: 0.8936
Epoch 6/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 7s 6ms/step - accuracy: 0.6404 - loss: 0.6905 - val_accuracy: 0.6109 - val_loss: 0.7520
Epoch 7/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6626 - loss: 0.5749 - val_accuracy: 0.5690 - val_loss: 0.7843
Epoch 8/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.6596 - loss: 0.5625 - val_accuracy: 0.4732 - val_loss: 0.8491
Epoch 9/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6651 - loss: 0.5669 - val_accuracy: 0.2119 - val_loss: 0.9939
Epoch 10/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6558 - loss: 0.5752 - val_accuracy: 0.4079 - val_loss: 0.8870
Epoch 11/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6552 - loss: 0.5732 - val_accuracy: 0.3775 - val_loss: 0.9108
Epoch 12/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.6655 - loss: 0.5665 - val_accuracy: 0.3534 - val_loss: 0.8681
Epoch 13/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6585 - loss: 0.5628 - val_accuracy: 0.5191 - val_loss: 0.8326
Epoch 14/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 7s 10ms/step - accuracy: 0.6693 - loss: 0.5629 - val_accuracy: 0.1868 - val_loss: 0.9918
Epoch 15/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.6654 - loss: 0.5637 - val_accuracy: 0.2167 - val_loss: 0.9838
Epoch 16/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6633 - loss: 0.5620 - val_accuracy: 0.5364 - val_loss: 0.8218
Epoch 17/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 6s 9ms/step - accuracy: 0.6712 - loss: 0.5548 - val_accuracy: 0.0867 - val_loss: 1.0600
Epoch 18/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.6676 - loss: 0.5571 - val_accuracy: 0.2127 - val_loss: 0.9782
Epoch 19/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6728 - loss: 0.5511 - val_accuracy: 0.4855 - val_loss: 0.8493
Epoch 20/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.6681 - loss: 0.5556 - val_accuracy: 0.5192 - val_loss: 0.8154
Epoch 21/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6689 - loss: 0.5528 - val_accuracy: 0.3663 - val_loss: 0.8720
Epoch 22/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6726 - loss: 0.5465 - val_accuracy: 0.3380 - val_loss: 0.8698
Epoch 23/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 7s 10ms/step - accuracy: 0.6766 - loss: 0.5463 - val_accuracy: 0.6012 - val_loss: 0.7625
Epoch 24/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6791 - loss: 0.5383 - val_accuracy: 0.5469 - val_loss: 0.7909
Epoch 25/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6787 - loss: 0.5373 - val_accuracy: 0.4872 - val_loss: 0.8019
Epoch 26/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 9ms/step - accuracy: 0.6789 - loss: 0.5359 - val_accuracy: 0.5056 - val_loss: 0.9265
Epoch 27/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.6635 - loss: 0.5601 - val_accuracy: 0.3228 - val_loss: 0.8857
Epoch 28/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6782 - loss: 0.5407 - val_accuracy: 0.3855 - val_loss: 0.8394
Epoch 29/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.6843 - loss: 0.5367 - val_accuracy: 0.5415 - val_loss: 0.7454
Epoch 30/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6861 - loss: 0.5330 - val_accuracy: 0.3282 - val_loss: 0.8989
Epoch 31/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.6832 - loss: 0.5391 - val_accuracy: 0.4155 - val_loss: 0.8373
Epoch 32/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6800 - loss: 0.5333 - val_accuracy: 0.4862 - val_loss: 0.7718
Epoch 33/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 9ms/step - accuracy: 0.6864 - loss: 0.5288 - val_accuracy: 0.6431 - val_loss: 0.6870
Epoch 34/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.6841 - loss: 0.5297 - val_accuracy: 0.5378 - val_loss: 0.7827
Epoch 35/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6851 - loss: 0.5295 - val_accuracy: 0.5031 - val_loss: 0.7671
Epoch 36/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 7s 10ms/step - accuracy: 0.6856 - loss: 0.5273 - val_accuracy: 0.4869 - val_loss: 0.7723
Epoch 37/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.6893 - loss: 0.5288 - val_accuracy: 0.5811 - val_loss: 0.7202
Epoch 38/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6901 - loss: 0.5239 - val_accuracy: 0.5346 - val_loss: 0.6777
Epoch 39/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 7ms/step - accuracy: 0.6861 - loss: 0.5307 - val_accuracy: 0.5800 - val_loss: 0.7230
Epoch 40/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6914 - loss: 0.5245 - val_accuracy: 0.4882 - val_loss: 0.8090
Epoch 41/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6880 - loss: 0.5264 - val_accuracy: 0.5930 - val_loss: 0.7148
Epoch 42/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6869 - loss: 0.5317 - val_accuracy: 0.5014 - val_loss: 0.7830
Epoch 43/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 9ms/step - accuracy: 0.6866 - loss: 0.5231 - val_accuracy: 0.5895 - val_loss: 0.7439
Epoch 44/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6900 - loss: 0.5226 - val_accuracy: 0.5161 - val_loss: 0.7340
Epoch 45/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6887 - loss: 0.5197 - val_accuracy: 0.4712 - val_loss: 0.8267
Epoch 46/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6890 - loss: 0.5271 - val_accuracy: 0.5461 - val_loss: 0.6926
Epoch 47/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.6891 - loss: 0.5238 - val_accuracy: 0.5652 - val_loss: 0.7166
Epoch 48/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6968 - loss: 0.5160 - val_accuracy: 0.6517 - val_loss: 0.6701
Epoch 49/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6997 - loss: 0.5154 - val_accuracy: 0.6021 - val_loss: 0.7174
Epoch 50/50
439/439 ━━━━━━━━━━━━━━━━━━━━ 8s 13ms/step - accuracy: 0.6909 - loss: 0.5199 - val_accuracy: 0.5473 - val_loss: 0.7293
In [185]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_9.history['loss'])
plt.plot(history_9.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
  • Overfitting Indication: Training loss remains consistently lower than validation loss, suggesting potential overfitting.
  • Initial Convergence: Sharp loss drop in early epochs, indicating quick learning.
  • Validation Loss Fluctuation: Suggests model instability or noise in validation data.
In [187]:
from sklearn.metrics import roc_curve

from matplotlib import pyplot

# predict probabilities
yhat9 = model9.predict(x_test)
# keep probabilities for the positive outcome only
yhat9 = yhat9[:, 0]
# calculate roc curves
fpr, tpr, thresholds9 = roc_curve(y_test, yhat9)
# calculate the g-mean for each threshold
gmeans9 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans9)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds9[ix], gmeans9[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step
Best Threshold=0.391903, G-Mean=0.707
No description has been provided for this image
  • ROC Curve Analysis: The model performs better than random guessing but has room for improvement.
  • Best Threshold: 0.391903, optimizing the balance between sensitivity and specificity.
  • G-Mean: 0.707, indicating moderate classification performance.
In [188]:
y_pred_e9=model9.predict(x_test)
y_pred_e9 = (y_pred_e9 > thresholds9[ix])
y_pred_e9
192/192 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step
Out[188]:
array([[False],
       [ True],
       [ True],
       ...,
       [ True],
       [False],
       [False]])
In [189]:
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm9=confusion_matrix(y_test, y_pred_e9)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm9,
                      group_names=labels,
                      categories=categories,
                      cmap='Blues')
No description has been provided for this image
In [190]:
#Accuracy as per the classification report
from sklearn import metrics
cr6=metrics.classification_report(y_test,y_pred_e9)
print(cr6)
              precision    recall  f1-score   support

           0       0.90      0.59      0.72      4384
           1       0.45      0.84      0.59      1755

    accuracy                           0.66      6139
   macro avg       0.68      0.72      0.65      6139
weighted avg       0.77      0.66      0.68      6139

  • The model favors identifying liver patients (high recall for class 1) but misclassifies many non-liver patients (high false positives).
  • Precision for liver patients is low, meaning many non-liver cases are misclassified as liver.

Model 10 - Grid Search CV¶

In [203]:
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
In [204]:
def create_model_v7():
    np.random.seed(1337)
    model10 = Sequential()
    model10.add(Dense(256,activation='relu',input_dim = x_train.shape[1]))
    model10.add(Dropout(0.3))
    #model10.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
    model10.add(Dense(128,activation='relu'))
    model10.add(Dropout(0.3))
    model10.add(Dense(64,activation='relu'))
    model10.add(Dropout(0.2))
    #model10.add(Dense(32,activation='relu',kernel_initializer='he_uniform'))
    #model10.add(Dropout(0.3))
    model10.add(Dense(32,activation='relu'))
    model10.add(Dense(1, activation='sigmoid'))

    #compile model
    optimizer = tf.keras.optimizers.Adam()
    model10.compile(optimizer = optimizer,loss = 'binary_crossentropy', metrics = ['accuracy'])
    return model10
In [205]:
keras_estimator = KerasClassifier(build_fn=create_model_v7, verbose=1)
In [206]:
# define the grid search parameters
batch_size= [32, 64, 128]
Learn_rate = [0.001,0.01,0.1]
param_grid = dict(optimizer__learning_rate=learn_rate, batch_size=batch_size)

kfold_splits = 3
grid = GridSearchCV(estimator=keras_estimator,
                    verbose=1,
                    cv=kfold_splits,
                    param_grid=param_grid,n_jobs=-1)
grid_result = grid.fit(x_train, y_train,validation_split=0.2,verbose=1)
Fitting 3 folds for each of 9 candidates, totalling 27 fits
614/614 ━━━━━━━━━━━━━━━━━━━━ 7s 6ms/step - accuracy: 0.6500 - loss: 1.5521 - val_accuracy: 0.7015 - val_loss: 0.5664
In [207]:
# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
Best: 0.717090 using {'batch_size': 32, 'optimizer__learning_rate': 0.01}
In [208]:
estimator_v7=create_model_v7()

estimator_v7.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense_5 (Dense)                      │ (None, 256)                 │           1,280 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_3 (Dropout)                  │ (None, 256)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_6 (Dense)                      │ (None, 128)                 │          32,896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_4 (Dropout)                  │ (None, 128)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_7 (Dense)                      │ (None, 64)                  │           8,256 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_5 (Dropout)                  │ (None, 64)                  │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_8 (Dense)                      │ (None, 32)                  │           2,080 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_9 (Dense)                      │ (None, 1)                   │              33 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 44,545 (174.00 KB)
 Trainable params: 44,545 (174.00 KB)
 Non-trainable params: 0 (0.00 B)
In [209]:
optimizer = tf.keras.optimizers.Adam()
estimator_v7.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
history_10=estimator_v7.fit(X_sm, y_sm, epochs=50, batch_size = grid_result.best_params_['batch_size'], verbose=1,validation_split=0.2)
Epoch 1/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 4ms/step - accuracy: 0.5869 - loss: 1.0732 - val_accuracy: 0.2904 - val_loss: 0.7896
Epoch 2/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6308 - loss: 0.6085 - val_accuracy: 0.5627 - val_loss: 0.7659
Epoch 3/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6471 - loss: 0.5971 - val_accuracy: 0.5681 - val_loss: 0.7511
Epoch 4/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6544 - loss: 0.5800 - val_accuracy: 0.7233 - val_loss: 0.6949
Epoch 5/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6605 - loss: 0.5701 - val_accuracy: 0.7566 - val_loss: 0.6731
Epoch 6/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6684 - loss: 0.5642 - val_accuracy: 0.7313 - val_loss: 0.7021
Epoch 7/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6697 - loss: 0.5549 - val_accuracy: 0.7869 - val_loss: 0.6559
Epoch 8/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6767 - loss: 0.5506 - val_accuracy: 0.7515 - val_loss: 0.6650
Epoch 9/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6780 - loss: 0.5478 - val_accuracy: 0.6075 - val_loss: 0.7320
Epoch 10/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6834 - loss: 0.5457 - val_accuracy: 0.7338 - val_loss: 0.6597
Epoch 11/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 4ms/step - accuracy: 0.6812 - loss: 0.5448 - val_accuracy: 0.7705 - val_loss: 0.6376
Epoch 12/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6847 - loss: 0.5395 - val_accuracy: 0.7153 - val_loss: 0.6439
Epoch 13/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6828 - loss: 0.5413 - val_accuracy: 0.6851 - val_loss: 0.6377
Epoch 14/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6891 - loss: 0.5394 - val_accuracy: 0.6295 - val_loss: 0.6638
Epoch 15/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.6853 - loss: 0.5361 - val_accuracy: 0.6721 - val_loss: 0.6646
Epoch 16/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6896 - loss: 0.5363 - val_accuracy: 0.5974 - val_loss: 0.6442
Epoch 17/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6908 - loss: 0.5338 - val_accuracy: 0.6392 - val_loss: 0.6563
Epoch 18/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.6904 - loss: 0.5314 - val_accuracy: 0.6632 - val_loss: 0.6290
Epoch 19/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6936 - loss: 0.5329 - val_accuracy: 0.6590 - val_loss: 0.6471
Epoch 20/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6967 - loss: 0.5291 - val_accuracy: 0.6888 - val_loss: 0.6175
Epoch 21/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6945 - loss: 0.5283 - val_accuracy: 0.7076 - val_loss: 0.6035
Epoch 22/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6953 - loss: 0.5286 - val_accuracy: 0.6200 - val_loss: 0.6685
Epoch 23/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6931 - loss: 0.5257 - val_accuracy: 0.7197 - val_loss: 0.5801
Epoch 24/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6949 - loss: 0.5260 - val_accuracy: 0.7005 - val_loss: 0.6054
Epoch 25/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6917 - loss: 0.5236 - val_accuracy: 0.6309 - val_loss: 0.6635
Epoch 26/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6935 - loss: 0.5244 - val_accuracy: 0.6718 - val_loss: 0.6215
Epoch 27/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6971 - loss: 0.5244 - val_accuracy: 0.7080 - val_loss: 0.6353
Epoch 28/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6972 - loss: 0.5222 - val_accuracy: 0.6936 - val_loss: 0.6266
Epoch 29/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6959 - loss: 0.5209 - val_accuracy: 0.7237 - val_loss: 0.6125
Epoch 30/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6986 - loss: 0.5212 - val_accuracy: 0.6593 - val_loss: 0.6267
Epoch 31/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6967 - loss: 0.5217 - val_accuracy: 0.6871 - val_loss: 0.6251
Epoch 32/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6966 - loss: 0.5193 - val_accuracy: 0.6893 - val_loss: 0.6200
Epoch 33/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6978 - loss: 0.5186 - val_accuracy: 0.6359 - val_loss: 0.6438
Epoch 34/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6952 - loss: 0.5198 - val_accuracy: 0.6457 - val_loss: 0.6777
Epoch 35/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6961 - loss: 0.5194 - val_accuracy: 0.6674 - val_loss: 0.6333
Epoch 36/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7010 - loss: 0.5170 - val_accuracy: 0.6389 - val_loss: 0.6311
Epoch 37/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6982 - loss: 0.5177 - val_accuracy: 0.7037 - val_loss: 0.6252
Epoch 38/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7016 - loss: 0.5156 - val_accuracy: 0.6933 - val_loss: 0.6136
Epoch 39/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7022 - loss: 0.5142 - val_accuracy: 0.7164 - val_loss: 0.6004
Epoch 40/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6988 - loss: 0.5181 - val_accuracy: 0.7070 - val_loss: 0.5828
Epoch 41/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7000 - loss: 0.5147 - val_accuracy: 0.6654 - val_loss: 0.5982
Epoch 42/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7020 - loss: 0.5145 - val_accuracy: 0.6459 - val_loss: 0.6094
Epoch 43/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.7002 - loss: 0.5137 - val_accuracy: 0.7003 - val_loss: 0.6181
Epoch 44/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7002 - loss: 0.5154 - val_accuracy: 0.7287 - val_loss: 0.5773
Epoch 45/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6983 - loss: 0.5160 - val_accuracy: 0.6727 - val_loss: 0.6070
Epoch 46/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7044 - loss: 0.5147 - val_accuracy: 0.7193 - val_loss: 0.5977
Epoch 47/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7017 - loss: 0.5128 - val_accuracy: 0.6553 - val_loss: 0.5912
Epoch 48/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7007 - loss: 0.5142 - val_accuracy: 0.6866 - val_loss: 0.5904
Epoch 49/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7023 - loss: 0.5114 - val_accuracy: 0.6169 - val_loss: 0.6382
Epoch 50/50
877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7015 - loss: 0.5126 - val_accuracy: 0.7141 - val_loss: 0.6035
In [210]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_10.history['loss'])
plt.plot(history_10.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
In [211]:
from sklearn.metrics import roc_curve

from matplotlib import pyplot

# predict probabilities
yhat10 = estimator_v7.predict(x_test)
# keep probabilities for the positive outcome only
yhat10 = yhat10[:, 0]
# calculate roc curves
fpr, tpr, thresholds10 = roc_curve(y_test, yhat10)
# calculate the g-mean for each threshold
gmeans10 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans10)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds10[ix], gmeans10[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
Best Threshold=0.470850, G-Mean=0.729
No description has been provided for this image
  • G-Mean = 0.729 indicates a good balance between sensitivity and specificity.
  • Best Threshold = 0.47085, suggesting this threshold optimizes the trade-off between True Positive and False Positive rates.
In [213]:
y_pred_e10=estimator_v7.predict(x_test)
y_pred_e10 = (y_pred_e10 > thresholds10[ix])
y_pred_e10
192/192 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step
Out[213]:
array([[False],
       [ True],
       [ True],
       ...,
       [ True],
       [False],
       [False]])
In [214]:
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm10=confusion_matrix(y_test, y_pred_e10)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm10,
                      group_names=labels,
                      categories=categories,
                      cmap='Blues')
No description has been provided for this image
In [215]:
#Accuracy as per the classification report
from sklearn import metrics
cr10=metrics.classification_report(y_test,y_pred_e10)
print(cr10)
              precision    recall  f1-score   support

           0       0.90      0.65      0.75      4384
           1       0.48      0.81      0.60      1755

    accuracy                           0.70      6139
   macro avg       0.69      0.73      0.68      6139
weighted avg       0.78      0.70      0.71      6139

  • recision for class 1 (liver patients) is 0.48, meaning many false positives.
  • Recall for class 1 is 0.81, showing the model captures most actual liver cases.
  • Overall accuracy is 70%, but weighted scores indicate a better performance when considering class imbalance.
  • Macro F1-score (0.68) suggests the model performs better than a random classifier but could be improved

Metric Comparison¶

Model Performance Comparison & Ranking¶

Evaluation Metrics¶

The table below summarizes the key performance metrics of all models tested, including accuracy, precision, recall, F1-score, and G-Mean.

Model Name Accuracy Precision (Class 1) Recall (Class 1) F1-Score (Class 1) G-Mean Best Threshold
Model 1 - Baseline Logistic Regression 0.66 0.45 0.84 0.59 0.717 0.385
Model 2 - Random Forest Classifier 0.70 0.48 0.81 0.60 0.729 0.471
Model 3 - XGBoost Classifier 0.68 0.46 0.82 0.58 0.707 0.392

Ranking Based on Performance The ranking is based on G-Mean, which balances sensitivity and specificity, along with accuracy and F1-score.

  1. Model 2 - Random Forest Classifier 🏆 (Highest Accuracy: 0.70, Best G-Mean: 0.729)
  2. Model 3 - XGBoost Classifier (Balanced but slightly lower accuracy)
  3. Model 1 - Baseline Logistic Regression (Lowest accuracy and G-Mean)

Conclusion

  • Random Forest is the best-performing model based on accuracy, F1-score, and G-Mean. It provides the best balance between detecting liver disease and minimizing false positives.
  • Recommendation: Deploy Random Forest for production use and monitor performance regularly.
  • Further improvements can be achieved through feature engineering, hyperparameter tuning, and ensemble methods.

Business Recommendations¶

  • Early Detection & Preventive Screening

    • The model's predictive capabilities can assist healthcare providers in identifying high-risk individuals earlier, allowing for preventive interventions and lifestyle modifications.
  • Optimized Resource Allocation

    • Healthcare facilities can prioritize at-risk patients based on model predictions, ensuring efficient use of medical resources, such as diagnostic tests and specialist consultations.
  • Improved Patient Stratification

    • By leveraging model insights, hospitals can categorize patients into risk groups, enabling more personalized treatment plans and reducing unnecessary hospital visits.
  • Refinement of Model & Data Collection

    • To improve accuracy, the business should invest in gathering additional patient data, refining feature engineering, and experimenting with more advanced machine learning models.
  • Integration into Electronic Health Records (EHRs)

    • Deploying the model within hospital EHR systems can provide real-time risk assessments, helping physicians make data-driven decisions at the point of care.
  • Targeted Public Health Campaigns

    • The insights can be used to tailor awareness programs focusing on modifiable risk factors, such as alcohol consumption, obesity, and hepatitis infections.
  • Cost Reduction in Liver Disease Treatment

    • Early diagnosis leads to lower treatment costs by reducing complications, hospital admissions, and the need for advanced interventions like liver transplants.
  • Regulatory & Ethical Considerations

    • Ensure model transparency, fairness, and compliance with healthcare regulations like HIPAA or PHIPA to maintain patient trust and data privacy.
In [217]:
# google

path_ipynb = '/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/Predicting_Liver_Disease_Using_Machine_Learning_A_Data_Driven_Approach_in_Binary_Classification.ipynb'
notebook_path = path_ipynb

!jupyter nbconvert --to html "{notebook_path}"

from google.colab import files
path_html = '/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/Predicting_Liver_Disease_Using_Machine_Learning_A_Data_Driven_Approach_in_Binary_Classification.html'

files.download(path_html)
[NbConvertApp] Converting notebook /content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/Predicting_Liver_Disease_Using_Machine_Learning_A_Data_Driven_Approach_in_Binary_Classification.ipynb to html
[NbConvertApp] WARNING | Alternative text is missing on 64 image(s).
[NbConvertApp] Writing 5331169 bytes to /content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/Predicting_Liver_Disease_Using_Machine_Learning_A_Data_Driven_Approach_in_Binary_Classification.html
In [216]: